Robust Multi-Modality Multi-Object Tracking (1909.03850v1)

Published 9 Sep 2019 in cs.CV

Abstract: Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects. Most current approaches for multi-sensor multi-object tracking are either lack of reliability by tightly relying on a single input source (e.g., center camera), or not accurate enough by fusing the results from multiple sensors in post processing without fully exploiting the inherent information. In this study, we design a generic sensor-agnostic multi-modality MOT framework (mmMOT), where each modality (i.e., sensors) is capable of performing its role independently to preserve reliability, and further improving its accuracy through a novel multi-modality fusion module. Our mmMOT can be trained in an end-to-end manner, enables joint optimization for the base feature extractor of each modality and an adjacency estimator for cross modality. Our mmMOT also makes the first attempt to encode deep representation of point cloud in data association process in MOT. We conduct extensive experiments to evaluate the effectiveness of the proposed framework on the challenging KITTI benchmark and report state-of-the-art performance. Code and models are available at https://github.com/ZwwWayne/mmMOT.

Authors (6)

Wenwei Zhang (77 papers)
Hui Zhou (86 papers)
Shuyang Sun (25 papers)
Zhe Wang (574 papers)
Jianping Shi (76 papers)
Chen Change Loy (288 papers)

Citations (181)

View on Semantic Scholar

Summary

Robust Multi-Modality Multi-Object Tracking

The paper "Robust Multi-Modality Multi-Object Tracking" proposes a framework (mmMOT) engineered to enhance both the reliability and accuracy of multi-object tracking (MOT) systems in autonomous driving. Autonomy in vehicles mandates systems that exhibit robust performance, one that can withstand sensor malfunctions and efficiently integrate information from varied sensors. The proposed framework addresses these dual aspects by allowing each sensor modality to operate autonomously and enhancing the information fusion process to improve detection and tracking accuracy.

Central to the mmMOT framework is its sensor-agnostic design that allows each modality to function independently. This modular approach ensures operational reliability, reducing the risk of system-wide performance degradation should one sensor fail. In tandem, a novel multi-modality fusion module integrates information across modalities, aiming to bolster tracking accuracy. Such an integration leverages the strengths of individual sensors—such as the visual richness of cameras and spatial accuracy of LiDAR—to enhance the fidelity of multi-object perceptions.

The framework innovatively trains mmMOT end-to-end, allowing for the joint optimization of individual sensor feature extraction and cross-modality adjacency estimation. The adjacency estimator plays a pivotal role by determining the potential connections among detections across consecutive frames, thereby facilitating seamless tracking with improved accuracy. Notably, this work ventures into uncharted territory by introducing the use of deep point cloud representation in data association, establishing it as a powerful feature for tracking objects.

Empirical validation of the framework was performed using the KITTI benchmark, a standard in autonomous driving research. The results demonstrated superior performance, achieving state-of-the-art results with significant improvements in MOTA and identity switches compared to existing methodologies. This underscores the efficacy of the framework in real-world scenarios, reflecting substantial reductions in false positives and better handling of occlusions and sensor failures.

The paper's claim of achieving robustness is substantiated through experiments where individual modalities, as well as fused modalities, are employed. Even under conditions of simulated sensor failures, mmMOT maintained competitive performance metrics, validating the independence and reliability of the sensor-specific component design.

The implications of this research are manifold. Practically, it paves the way for more resilient autonomous vehicle systems that can maintain high performance without exclusive dependence on any single sensor. Theoretically, it suggests avenues for exploring deep learning's role in optimizing modular sensor networks and refining sensor fusion techniques through joint training paradigms.

In considering future directions, the potential for expanding the framework to integrate additional modalities, such as radar or ultrasonic sensors, could be explored. Moreover, improvements in the attention mechanism within the fusion module offer an interesting area of research, specifically in optimizing dynamic weighting strategies based on environmental conditions. Such advancements could further tighten the reliability and accuracy of multi-modality systems, echoing the growing complexity and demands of modern autonomous systems.

In conclusion, the mmMOT framework introduced in this paper appears as a robust, adaptable, and precise approach to multi-sensor tracking, especially suited for the challenging and dynamic application of autonomous driving. It brings to the forefront the criticality of crossing sensor boundaries to successfully tackle the challenges of seamless multi-object tracking.

PDF Markdown

Related Papers

GitHub

GitHub - ZwwWayne/mmMOT: [ICCV2019] Robust Multi-Modality Multi-Object Tracking (252 stars)