StarNet: Targeted Computation for Object Detection in Point Clouds (1908.11069v3)

Published 29 Aug 2019 in cs.CV

Abstract: Detecting objects from LiDAR point clouds is an important component of self-driving car technology as LiDAR provides high resolution spatial information. Previous work on point-cloud 3D object detection has re-purposed convolutional approaches from traditional camera imagery. In this work, we present an object detection system called StarNet designed specifically to take advantage of the sparse and 3D nature of point cloud data. StarNet is entirely point-based, uses no global information, has data dependent anchors, and uses sampling instead of learned region proposals. We demonstrate how this design leads to competitive or superior performance on the large Waymo Open Dataset and the KITTI detection dataset, as compared to convolutional baselines. In particular, we show how our detector can outperform a competitive baseline on Pedestrian detection on the Waymo Open Dataset by more than 7 absolute mAP while being more computationally efficient. We show how our redesign---namely using only local information and using sampling instead of learned proposals---leads to a significantly more flexible and adaptable system: we demonstrate how we can vary the computational cost of a single trained StarNet without retraining, and how we can target proposals towards areas of interest with priors and heuristics. Finally, we show how our design allows for incorporating temporal context by using detections from previous frames to target computation of the detector, which leads to further improvements in performance without additional computational cost.

Citations (103)

View on Semantic Scholar

Summary

The paper introduces a novel targeted computation strategy that replaces global proposals with efficient, localized sampling for 3D object detection.
It employs stacked point-featurization blocks to flexibly adjust to different computational budgets without retraining.
The model outperforms CNN-based baselines, achieving over 7% higher mAP on benchmarks like Waymo and KITTI.

Overview of "StarNet: Targeted Computation for Object Detection in Point Clouds"

The paper presents a novel object detection system, StarNet, tailored for 3D LiDAR point cloud data—a critical component in autonomous vehicle perception. Unlike traditional methods that adapt 2D convolutional neural network (CNN) techniques for camera images, StarNet introduces a purely point-based approach that capitalizes on the inherent sparsity and three-dimensionality of point cloud data.

Key Innovations and Methodology

StarNet differs from preceding algorithms by using a targeted computation strategy, whereby it eschews the prevalent use of global information in favor of localized, data-dependent proposals. It eliminates the need for learned region proposals and instead employs efficient sampling methods such as random uniform sampling and farthest point sampling, ensuring high recall of object coverage with minimal computational resources.

The model's architecture utilizes stacked blocks for point cloud featurization, allowing it to adapt flexibly to varying computational constraints on-the-fly. This capability enables StarNet to operate under different computational budgets without necessitating retraining, a significant advantage over standard CNN-based detectors.

Performance Metrics and Comparison

The efficacy of StarNet is demonstrated through its competitive or superior performance on benchmark datasets like the Waymo Open Dataset and KITTI, where it outperforms convolutional baselines, notably achieving over 7% higher mean Average Precision (mAP) in pedestrian detection on the Waymo dataset. The results are particularly noteworthy given the computational efficiency that StarNet achieves, often requiring fewer resources for similar or better predictive accuracy.

Implications and Future Directions

The implications of StarNet are multifaceted. Practically, its ability to adapt computation enables real-time application in self-driving technologies, where computational resources may be limited, and scene dynamics require adaptive focus. Theoretically, the model challenges the conventional reliance on globally consistent proposal mechanisms, suggesting that 3D object detection can benefit substantially from localized processing that mirrors the data distribution.

Future directions anticipated from this work include exploring more sophisticated sampling methods or integrating multi-sensor data, such as combining LiDAR with camera information, to enhance detection performance. Additionally, StarNet's design lends itself well to advances in object tracking, where incremental frame-to-frame computation could further reduce operational latency and improve situational awareness for autonomous systems.

Conclusion

StarNet represents a significant advancement in LiDAR-based object detection by aligning computational strategies with the distinctive properties of point cloud data. Its flexibility and efficiency underscore the potential for such targeted computation methods in advancing the state of autonomous driving technologies. Further research inspired by this approach could lead to even more efficient, robust, and context-aware perception systems.

PDF Markdown

Related Papers

GitHub

GitHub - tensorflow/lingvo: Lingvo (2,844 stars)