EagerMOT: 3D Multi-Object Tracking via Sensor Fusion (2104.14682v1)

Published 29 Apr 2021 in cs.CV and cs.RO

Abstract: Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time. Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal. On the other hand, cameras provide a dense and rich visual signal that helps to localize even distant objects, but only in the image domain. In this paper, we propose EagerMOT, a simple tracking formulation that eagerly integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics. Using images, we can identify distant incoming objects, while depth estimates allow for precise trajectory localization as soon as objects are within the depth-sensing range. With EagerMOT, we achieve state-of-the-art results across several MOT tasks on the KITTI and NuScenes datasets. Our code is available at https://github.com/aleksandrkim61/EagerMOT.

Citations (160)

View on Semantic Scholar

Summary

The paper presents a sensor fusion approach that combines 3D LiDAR and 2D camera data to overcome the limitations of individual sensors in object tracking.
It employs a two-stage data association process that first fuses detections via overlapping image space and then matches image-only detections using a 2D IoU metric.
The method achieves state-of-the-art performance, with an AMOTA of 0.68 on NuScenes and improved tracking accuracy on KITTI, underscoring its practical benefits for autonomous systems.

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion

The paper "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" by Aleksandr Kim, Aljoša Ošep, and Laura Leal-Taixé presents a method for enhancing multi-object tracking (MOT) by integrating data from LiDAR and camera sensors. This approach addresses the limitations of relying solely on depth sensors or optical data for object tracking, proposing an eager integration of observations from both modalities to achieve a more comprehensive understanding of scene dynamics.

Methodology Overview

The proposed framework, EagerMOT, is built on the premise that combining sensor data can compensate for individual sensor shortcomings. Depth sensors like LiDAR have limited range due to signal sparsity, while cameras provide detailed image-based information but only in the two-dimensional plane. By merging these modalities, EagerMOT enables precise tracking of distant objects and improves trajectory localization within the depth sensor's operational scope.

EagerMOT employs a two-stage data association process:

Fusion and First-Stage Association: Object detections from both 2D images and 3D LiDAR data are first paired based on their mutual overlap in the image plane. The overlapping objects form a fused instance, which is then used to update track states in subsequent frames.
Second-Stage Association: This stage focuses on integrating data from objects that do not have a direct LiDAR match. It uses a two-dimensional intersection-over-union (2D IoU) metric to associate image-only detections with existing tracks, thus maintaining track consistency despite potential LiDAR occlusions or missed detections.

Experimental Results and Insights

EagerMOT achieves state-of-the-art performance on several benchmarks, including the KITTI and NuScenes datasets. Key performance indicators include:

NuScenes: The method reaches an AMOTA of 0.68, significantly outperforming prevalent models by utilizing 2D information to improve recall without sacrificing precision.
KITTI: Demonstrating high flexibility, EagerMOT performs well across various sensor configurations. The method boasts enhanced tracking accuracy by leveraging 2D data to supplement 3D detections.

Implications and Future Directions

From a theoretical standpoint, EagerMOT illustrates the potential of sensor fusion to address common 3D MOT challenges like occlusion and range limitations. Practically, it underscores an efficient, adaptable solution applicable to diverse autonomous systems, from robotic platforms to autonomous vehicles equipped with varying sensor setups. The framework's robust performance across multiple scenarios suggests its ability to improve real-time decision-making in complex environments.

Future research could further explore end-to-end learning approaches for data association and extend the framework to accommodate newer sensory technologies and even incorporate machine learning models that adaptively adjust sensor fusion based on environmental contexts.

In conclusion, EagerMOT highlights the advantages of combining rich visual data from cameras with precise but range-limited LiDAR depth data to enhance multi-object tracking. It sets a foundational baseline that suggests promising directions for enhanced autonomous sensing and navigation.

PDF Markdown

Related Papers

GitHub

GitHub - aleksandrkim61/EagerMOT: Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021] (397 stars)

YouTube

Show All Videos