Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection (2304.12315v1)

Published 24 Apr 2023 in cs.CV and cs.RO

Abstract: This paper aims for high-performance offline LiDAR-based 3D object detection. We first observe that experienced human annotators annotate objects from a track-centric perspective. They first label the objects with clear shapes in a track, and then leverage the temporal coherence to infer the annotations of obscure objects. Drawing inspiration from this, we propose a high-performance offline detector in a track-centric perspective instead of the conventional object-centric perspective. Our method features a bidirectional tracking module and a track-centric learning module. Such a design allows our detector to infer and refine a complete track once the object is detected at a certain moment. We refer to this characteristic as "onCe detecTed, neveR Lost" and name the proposed system CTRL. Extensive experiments demonstrate the remarkable performance of our method, surpassing the human-level annotating accuracy and the previous state-of-the-art methods in the highly competitive Waymo Open Dataset without model ensemble. The code will be made publicly available at https://github.com/tusen-ai/SST.

Citations (16)

View on Semantic Scholar

Summary

The paper demonstrates a track-centric approach that extends object detection both forward and backward in time for enhanced accuracy.
It introduces a bidirectional tracking module and a Multiple-In-Multiple-Out strategy to refine proposals with improved temporal coherence.
Empirical results on the Waymo Open Dataset show an 83.9% mAP and only 0.48% missed vehicles, surpassing human annotation performance.

Overview of the Paper: Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR-based 3D Object Detection

This paper introduces CTRL, an advanced offline LiDAR-based 3D object detection system that claims to exceed previously established human annotation accuracy and outperform state-of-the-art methods on the Waymo Open Dataset without relying on model ensemble strategies. The authors propose a track-centric approach, emphasizing a design philosophy of "once detected, never lost."

Methodological Innovation

The primary innovation lies in the shift from an object-centric to a track-centric perspective, which reflects the process used by human annotators. This involves a bidirectional tracking module and a track-centric learning module. The bidirectional tracking module extends the life cycle of detected objects both forward and backward in time, a technique referred to as "bidirectional extension." This is based on the assumption that once detected, objects do not disappear unless they move out of range.

The track-centric learning module processes entire tracks simultaneously, refining object proposals by leveraging temporal coherence. Unlike traditional Multiple-In-Single-Out (MISO) approaches, CTRL utilizes a Multiple-In-Multiple-Out (MIMO) approach, increasing computational efficiency and training effectiveness. This method allows all proposals within a track to be refined concurrently, greatly enhancing the temporal coherence of the detected objects.

Empirical Evaluation

The paper presents substantial empirical results demonstrating the efficacy of CTRL. On the Waymo Open Dataset, a notable benchmark in autonomous driving research, CTRL consistently outperforms both human annotators and other leading detection algorithms. For instance, the CTRL system achieved a mean Average Precision (mAP) of 83.9% with a high mAPH (mAP considering heading) of 82.3%, surpassing human performance reported by earlier studies such as 3DAL.

A remarkable finding is that CTRL missed only 0.48% of vehicles completely, indicating a robust detection capability. This performance is attributed to the system's ability to infer and track object motion across frames, even for objects that become temporarily occluded or have minimal points in certain frames.

Theoretical and Practical Implications

The design and results suggest significant implications for the future of autonomous driving. The track-centric approach could lead to more reliable and computationally efficient labeling in challenging environments, a major advantage for both training autonomous systems and enhancing real-time decision-making in driving scenarios. Additionally, the methods present potential for application in other domains requiring robust multi-object tracking and detection.

The paper also highlights the ability to surpass human-level accuracy in object detection tasks, raising questions about the future roles of human annotators in training data generation. As AI systems increasingly exceed human capabilities in specific tasks, the collaborative dynamics between machine learning and human expertise in system design and data interpretation will likely evolve.

Future Directions

Further research could explore integrating multi-modal inputs, such as combining LiDAR data with image data, to enhance the robustness and applicability of the system. Additionally, extending these methods to other types of perception datasets and refining the models for specific object classes (e.g., cyclists, pedestrians) might yield even more precise object tracking systems.

Conclusion

CTRL offers a compelling advancement in LiDAR-based 3D object detection, leveraging a novel track-centric method to achieve high accuracy without the need for model ensembling. Its ability to outperform both other models and human annotations makes it a significant contribution to the field, with promising prospects for practical implementation in autonomous vehicle technology.

Related Papers

GitHub

GitHub - tusen-ai/SST: Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral). (854 stars)