Center-based 3D Object Detection and Tracking (2006.11275v2)

Published 19 Jun 2020 in cs.CV

Abstract: Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions. The code and pretrained models are available at https://github.com/tianweiy/CenterPoint.

Authors (3)

Tianwei Yin (12 papers)
Xingyi Zhou (26 papers)
Philipp Krähenbühl (55 papers)

Citations (1,370)

View on Semantic Scholar

Summary

Center-based 3D Object Detection and Tracking: An Analytical Overview

In the paper titled "Center-based 3D Object Detection and Tracking," the authors propose a novel framework—CenterPoint—for addressing the challenges inherent in 3D object detection and tracking using Lidar point clouds. This research marks a significant development in the field, presenting a method that deviates from the conventional axis-aligned bounding box detectors which struggle with 3D objects' arbitrary orientations.

Core Contributions and Methodology

Center-based Representation

The paper's primary contribution is the introduction of a center-based representation for 3D objects. Rather than using bounding boxes, the proposed CenterPoint framework represents, detects, and tracks objects as points. This method leverages a keypoint detector to identify the centers of objects and then regresses to their physical properties such as size, orientation, and velocity. The process is executed in two stages:

Stage One: Detection of object centers and preliminary regression of their properties.
Stage Two: Refinement of these estimates using additional point features to enhance performance.

The center-based representation has several advantages. It reduces the search space for object detection by eliminating the need for axes-aligned bounding boxes and simplifies rotational invariance and equivariance learning. Moreover, it streamlines downstream tasks like tracking, as detecting objects as points translates directly to tracking paths in space and time.

Numerical Results and Performance

Experiments on Benchmarks

The effectiveness of the CenterPoint framework is validated against two major datasets: the Waymo Open Dataset and the nuScenes Dataset.

Waymo Open Dataset: CenterPoint achieved state-of-the-art results with 71.8 level 2 mAPH for vehicle detection and 66.4 level 2 mAPH for pedestrian detection. This was a significant improvement over previous methods, with up to 18.9 MOTA improvement for vehicle and pedestrian tracking.
nuScenes Dataset: CenterPoint outperformed previous single-model methods, achieving 65.5 NDS and 63.8 AMOTA. Notably, this model was faster and simpler than older methodologies.

These performance gains underline the computational efficiency and accuracy of the CenterPoint framework.

Comparative Analysis with Conventional Methods

The paper contrasts the center-based approach with anchor-based methods extensively:

Rotational Challenges: Conventional detectors struggle with rotated objects, producing numerous false positives and computational overhead due to the necessity of enumerating all possible orientations. CenterPoint's point-based detection method intrinsically handles rotational variance more adeptly.
Size and Aspect Ratio Variability: The authors highlight that anchor-based methods perform poorly for objects with extreme aspect ratios or non-standard sizes. In contrast, CenterPoint shows marked improvements in these cases, as evident from the superior detection metrics.

Implications and Future Directions

Practical Implications

The CenterPoint framework, due to its simplicity and efficiency, is particularly suited for real-time applications such as autonomous driving systems. Its ability to handle diverse object orientations and sizes means it can provide more reliable performance in dynamic environments. Furthermore, the tracker’s near real-time execution (11 FPS on the Waymo dataset and 16 FPS on the nuScenes dataset) ensures it meets the stringent timing requirements of real-world scenarios.

Theoretical Implications and Future Work

Theoretically, the proposed center-based framework provides insights into how object representation impacts detection and tracking performance in 3D spaces. It underscores the importance of rotational invariance and spatial consistency, advocating for more robust models that can generalize across various orientations and scales.

Future research could explore:

Integration with multi-sensory data: Combining Lidar with RGB or thermal imaging to further increase reliability and accuracy.
Enhanced refinement stages: Investigating more sophisticated approaches for the second-stage enhancements, potentially leveraging advanced deep learning techniques to improve feature extraction and regression accuracy.

In conclusion, the paper presents a compelling case for the adoption of center-based representations in 3D object detection and tracking. The CenterPoint framework showcases robust numerical performance across benchmarks and addresses key limitations of anchor-based methods, making it a promising foundation for future advancements in autonomous systems and 3D object recognition technologies.

PDF Markdown

Related Papers

GitHub

GitHub - tianweiy/CenterPoint (1,797 stars)

YouTube

Show All Videos