Deep Convolutional Neural Networks for Multi-Object Panoptic Tracking

This demo contains PanopticTrackNet trained on Virtual KITTI 2 and SemanticKITTI datasets. Select a dataset to load from the drop down box below and click on an image in the carosel to see the results. To learn more about the network architecture and the approach employed, please see the Technical Approach section below.

Please Select a Model:

Selected Dataset:

Virtual KITTI 2

Technical Approach

What is MOPT?

MOPT unifies the distinct tasks of semantic segmentation (pixel-wise classification of ‘stuff’ and ‘thing’ classes), instance segmentation (detection and segmentation of instance-specific ‘thing’ classes) and multi-object tracking (detection and association of ‘thing’ classes over time). The goal of this task is to encourage holistic modeling of dynamic scenes by tackling problems that are typically addressed disjointly in a coherent manner.

How Does It Work?

Network architecture
Figure: Overview of our proposed PanopticTrackNet architecture. Our network consists of a shared backbone with the 2-way FPN (red), semantic segmentation head (blue), instance segmentation head (green), instance tracking head (yellow), and the MOPT fusion module. The fusion module adaptively combines the predictions from each of the aforementioned heads to simultaneously yield pixel-level predictions of ‘stuff’ classes and instance-specific ‘thing’ classes with temporally tracked instance IDs.

The goal of our proposed architecture is to assign a semantic label to each pixel in an image, an instance ID to ‘thing’ classes, and a tracking ID to each object instance thereby incorporating temporal tracking of object instances into the panoptic segmentation task. We build upon the recently introduced state-of-the-art EfficientPS architecture for panoptic segmentation. To this end, we employ a novel shared backbone with the 2-way FPN to extract multi-scale features that are subsequently fed into three task-specific heads that simultaneously perform semantic segmentation, instance segmentation, and multi-object tracking. Finally, we adaptively fuse the task-specific outputs from each of the heads in our fusion module to yield the panoptic segmentation output with temporally tracked instances.

We demonstrated the performance our model using two different modalities, namely vision-based MOPT on Virtual KITTI 2 and LiDAR-based MOPT SemanticKITTI. With our findings, we demonstrate the feasibility of training MOPT models without restricting or ignoring the input dynamics and providing useful instance identification and semantic segmentation that are also coherent in time.



A software implementation of this project based on Pytorch can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.


Juana Valeria Hurtado, Rohit Mohan, Wolfram Burgard, Abhinav Valada,
MOPT: Multi-Object Panoptic Tracking
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Scalability in Autonomous Driving, 2020.

(Pdf) (Bibtex)