Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking (2008.08826v1)

Published 20 Aug 2020 in cs.CV

Abstract: Deep learning-based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection.This results in deep models that are detector biased and evaluations that are detector influenced. To resolve this issue, we introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association in an end-to-end manner. DMM-Net models object features over multiple frames and simultaneously infers object classes, visibility, and their motion parameters. These outputs are readily used to update the tracklets for efficient MOT. DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster. We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations to eliminate the detector influence in MOT evaluation. This 14M+ frames dataset is extendable with our public script (Code at Dataset https://github.com/shijieS/OmniMOTDataset, Dataset Recorder https://github.com/shijieS/OMOTDRecorder, Omni-MOT Source https://github.com/shijieS/DMMN). We demonstrate the suitability of Omni-MOT for deep learning with DMMNet and also make the source code of our network public.

Authors (6)

ShiJie Sun (24 papers)
Naveed Akhtar (77 papers)
HuanSheng Song (4 papers)
Ajmal Mian (136 papers)
Mubarak Shah (208 papers)
Xiangyu Song (13 papers)

Citations (44)

View on Semantic Scholar

Summary

The paper presents DMM-Net, an end-to-end system that simultaneously detects and tracks multiple objects by modeling their motion over consecutive frames.
It introduces the synthetic Omni-MOT dataset with over 14 million frames to overcome detector bias and enable robust evaluations.
The use of anchor tubes to model temporal motion parameters significantly enhances tracking performance and computational efficiency in complex conditions.

Overview of "Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking"

The paper addresses the inherent challenges in Multiple Object Tracking (MOT) that arise from the reliance on off-the-shelf detectors, which leads to detector bias. The proposed solution is the Deep Motion Modeling Network (DMM-Net), which simultaneously performs object detection and association, eliminating the need for pre-detection in the tracking pipeline. This method enhances computational efficiency and tracking performance by evaluating object motions, classes, and visibilities across multiple frames.

Key Contributions

DMM-Net Architecture: The DMM-Net is an end-to-end MOT system that integrates object detection within the tracking process by modeling object motion over time. The network utilizes a feature extractor based on ResNeXt blocks to simultaneously process multiple frames and predict motion parameters, object classes, and visibility using distinct subnetworks.
Synthetic Data Set - Omni-MOT: The authors created a synthetic dataset named Omni-MOT using the CARLA simulator to generate precise ground-truth annotations free from the detector bias. This dataset contains over 14 million frames and offers a diverse set of traffic scenes with various conditions and views, facilitating extensive performance evaluations.
Anchor Tubes and Motion Parameters: To overcome the limitations of traditional anchor boxes in the temporal domain, the authors introduce anchor tubes. These extend spatial anchor boxes along the temporal axis, enabling simultaneous modeling of object motion across multiple frames. The motion is decoupled into scalable parameters that DMM-Net predicts, providing a robust tracking mechanism even in cases of partial occlusion or overlapping trajectories.

Results and Performance

The proposed approach shows significant improvements in both performance and computational efficiency. When applied to the challenging UA-DETRAC benchmark, DMM-Net achieves a PR-MOTA score of 12.80 at speeds exceeding 123 fps, outperforming the existing methods that depend on separate detection and tracking components. This performance can be attributed to the intrinsic detection and motion modeling capabilities of DMM-Net.

Implications and Future Directions

The DMM-Net presents a paradigm shift in MOT by tightly coupling detection with tracking, thus reducing dependencies on potentially flawed external detectors. This has potential usability in real-time applications where computational efficiency and accuracy are critical, such as autonomous driving systems and surveillance.

The introduction of the Omni-MOT dataset also opens avenues for training and testing other deep learning models in a controlled environment, promoting transparency and reproducibility in MOT evaluations. The synthetic nature of the dataset, along with the released scripts for dataset extension, provides a flexible foundation for future research.

Furthermore, the paper hints at the possibility of using DMM-Net for different object tracking tasks beyond vehicles. Extending this model to track pedestrians or other objects of interest in varying environmental conditions could be a compelling direction. Additionally, exploring more sophisticated motion modeling techniques or incorporating other forms of temporal context could further enhance the model's adaptability and performance in complex scenes.

In summary, the paper presents a robust approach to MOT that could redefine how object detection and tracking are integrated, with broader implications for both academic research and industry applications.

PDF Markdown

Related Papers

GitHub

GitHub - shijieS/OmniMOTDataset: An Awesome Multiple Object Dataset (32 stars)
GitHub - shijieS/OMOTDRecorder: Omni-MOT dataset recorder (2 stars)
GitHub - shijieS/DMMN: Deep Motion Modeling Network (76 stars)