LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving (1903.08701v1)

Published 20 Mar 2019 in cs.CV, cs.LG, and cs.RO

Abstract: In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of the sensor, where the input data is naturally compact. Operating in the range view involves well known challenges for learning, including occlusion and scale variation, but it also provides contextual information based on how the sensor data was captured. Our approach uses a fully convolutional network to predict a multimodal distribution over 3D boxes for each point and then it efficiently fuses these distributions to generate a prediction for each object. Experiments show that modeling each detection as a distribution rather than a single deterministic box leads to better overall detection performance. Benchmark results show that this approach has significantly lower runtime than other recent detectors and that it achieves state-of-the-art performance when compared on a large dataset that has enough data to overcome the challenges of training on the range view.

Citations (326)

View on Semantic Scholar

Summary

The paper introduces a novel probabilistic detection pipeline that predicts multimodal 3D bounding boxes directly from LiDAR range view data.
It employs mean shift clustering and adaptive NMS to refine detections and dynamically manage overlapping predictions.
Experiments demonstrate state-of-the-art efficiency and enhanced pedestrian detection, highlighting its viability for real-time autonomous driving.

Overview of "LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving"

LaserNet proposes a novel approach to 3D object detection using LiDAR data targeting applications in autonomous driving. Central to this method is the efficient processing of LiDAR data in its range view representation - the native output form from LiDAR sensors - rather than converting the data into 3D point clouds or bird's eye views that are typical in traditional methods. This choice significantly boosts computational efficiency by leveraging the natural compactness of the range view while addressing the typical challenges of occlusion and scale variation inherent in this format.

Technical Contributions and Methodology

LaserNet's approach revolves around a fully convolutional network designed to work with range view data. Key stages in the processing pipeline include:

End-to-End Learning: The network predicts a multimodal distribution over potential 3D bounding boxes for each LiDAR point. This probabilistic approach departs from the deterministic bounding boxes used in alternative methods, thus enabling more robust detection capabilities.
Mean Shift Clustering: Implemented to aggregate predictions across neighboring LiDAR points to refine detections. This step mitigates noise by consolidating predictions into cohesive detections.
Adaptive Non-Maximum Suppression (NMS): A novel NMS technique is employed to handle overlapping box predictions, leveraging the predicted variance to determine dynamic IoU thresholds rather than a fixed one, aligned with the inherent uncertainty in detection locations.

Experimental Findings

Experiments demonstrated that LaserNet achieves state-of-the-art 3D object detection performance in terms of both computational efficiency and accuracy. Specifically, the results on a large-scale dataset show reduced runtime compared to recent detectors without sacrificing accuracy. In LiDAR-only setups, LaserNet surpasses contemporary methods, and its performance matches or beats systems that integrate RGB data, notably excelling in pedestrian detection.

Practical and Theoretical Implications

The implications of LaserNet are twofold. Practically, its efficiency is aligned with the real-time processing needs of autonomous systems, making it a viable candidate for deployment in production environments. Theoretically, it challenges the common practice of transforming range data into alternative representations, showcasing that adequately leveraging the range view's native format can be beneficial provided a sufficiently large dataset is available. This approach suggests a paradigm shift regarding canonical data representations in LiDAR-based perception systems.

Future Work and Challenges

The research hints at avenues for further exploration, such as sensor fusion to incorporate image data, potentially refining detection capabilities. Moreover, the performance disparity between small and large datasets emphasizes a need for revisiting training strategies or architectures when dealing with limited data. Extending this approach to accommodate dynamic environments or more complex scenes also presents potential challenges and opportunities.

In conclusion, LaserNet represents a significant step towards more efficient and robust 3D object detection systems for autonomous vehicles, leveraging probabilistic modeling with LiDAR data's native range view to attain enhanced detection performance. As the field of autonomous driving continues to evolve, methods that advocate efficient use of sensor outputs and probabilistic interpretations are likely to see increased relevance and adoption.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ElmoTheHokage/status/1850977218916467034