GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning (2006.07327v1)

Published 12 Jun 2020 in cs.CV, cs.LG, and eess.IV

Abstract: 3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work uses a standard tracking-by-detection pipeline, where feature extraction is first performed independently for each object in order to compute an affinity matrix. Then the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this standard pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. In this work, we propose two techniques to improve the discriminative feature learning for MOT: (1) instead of obtaining features for each object independently, we propose a novel feature interaction mechanism by introducing the Graph Neural Network. As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i.e., object probably with a same ID) and deviate from objects with dissimilar features (i.e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously. As features from different modalities often have complementary information, the joint feature can be more discriminate than feature from each individual modality. To ensure that the joint feature extractor does not heavily rely on one modality, we also propose an ensemble training paradigm. Through extensive evaluation, our proposed method achieves state-of-the-art performance on KITTI and nuScenes 3D MOT benchmarks. Our code will be made available at https://github.com/xinshuoweng/GNN3DMOT

Authors (4)

Xinshuo Weng (42 papers)
Yongxin Wang (21 papers)
Yunze Man (17 papers)
Kris Kitani (96 papers)

Citations (202)

View on Semantic Scholar

Summary

Overview of GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

The paper "GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning" presents a novel approach to the challenging problem of 3D multi-object tracking (MOT), a crucial component for autonomous systems. The proposed methodology introduces advanced techniques to enhance discriminative feature learning by integrating Graph Neural Networks (GNNs) and a joint feature extraction mechanism for appearance and motion features in both 2D and 3D spaces.

The innovation in this work lies in its two primary contributions: first, the introduction of a GNN-based feature interaction mechanism allows for informed feature representation, where an object's features are influenced by the features of other objects. This approach intends to produce more discriminative features, reducing the likelihood of identity switches. Second, the implementation of a joint feature extractor integrates complementary 2D and 3D information, leveraging their distinctive advantages and minimizing potential weaknesses inherent to separate modalities. An ensemble training paradigm further ensures balanced reliance on these feature spaces.

The authors conduct rigorous evaluations of their GNN3DMOT methodology on recognized benchmarks, KITTI and nuScenes, achieving state-of-the-art performance in 3D MOT metrics. These results underscore the method's efficacy in the tracking-by-detection framework commonly employed in modern multi-object tracking systems.

Detailed Contributions

Graph Neural Network for Feature Interaction: This work is noted for applying GNNs to 3D MOT, constructing a graph where each node corresponds to an object's features. The authors leverage the GNN’s node aggregation capabilities to iteratively refine object features, thereby improving the discriminative nature of the resultant affinity matrix used for data association tasks.
Joint 2D and 3D Feature Extraction: The proposed feature extractor integrates information from both 2D and 3D modalities, extracting appearance and motion features through distinct branches. This approach fully utilizes spatial and temporal cues, which collectively contribute to enhanced robustness and precision in object tracking.
Empirical Validation and Performance: Through comprehensive analysis, the paper demonstrates how their method significantly reduces identity switches (IDS) and fragmentation (FRAG) compared to existing methods, as evidenced in their results on KITTI and nuScenes datasets. Notable numerical improvements in sAMOTA and AMOTA are reported, validating their technique’s effectiveness.

Implications and Future Directions

The implications of this research are multi-faceted. Practically, the sharp increase in tracking accuracy will benefit various autonomous applications such as self-driving vehicles and robotic navigation. Theoretically, it opens up further investigation into GNNs' potential in feature interaction for MOT and related tasks.

Future research might explore extending the GNN framework to incorporate additional sensor data, such as radar or infrared, to increase the robustness of object tracking under various environmental conditions. Moreover, expanding the applicability of this framework to other domains where object tracking is pivotal, such as augmented reality or surveillance systems, would be a worthwhile endeavor.

In summary, the introduction of GNNs for feature interaction alongside a joint feature extraction paradigm marks a significant step forward in addressing the 3D MOT challenge, reaping the advantages of multi-modal data fusion with advanced neural architectures.

PDF Markdown

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning (2006.07327v1)

Summary

Overview of GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

Detailed Contributions

Implications and Future Directions

Related Papers