RangeDet:In Defense of Range View for LiDAR-based 3D Object Detection (2103.10039v1)

Published 18 Mar 2021 in cs.CV and cs.RO

Abstract: In this paper, we propose an anchor-free single-stage LiDAR-based 3D object detector -- RangeDet. The most notable difference with previous works is that our method is purely based on the range view representation. Compared with the commonly used voxelized or Bird's Eye View (BEV) representations, the range view representation is more compact and without quantization error. Although there are works adopting it for semantic segmentation, its performance in object detection is largely behind voxelized or BEV counterparts. We first analyze the existing range-view-based methods and find two issues overlooked by previous works: 1) the scale variation between nearby and far away objects; 2) the inconsistency between the 2D range image coordinates used in feature extraction and the 3D Cartesian coordinates used in output. Then we deliberately design three components to address these issues in our RangeDet. We test our RangeDet in the large-scale Waymo Open Dataset (WOD). Our best model achieves 72.9/75.9/65.8 3D AP on vehicle/pedestrian/cyclist. These results outperform other range-view-based methods by a large margin (~20 3D AP in vehicle detection), and are overall comparable with the state-of-the-art multi-view-based methods. Codes will be public.

Authors (5)

Lue Fan (26 papers)
Xuan Xiong (2 papers)
Feng Wang (408 papers)
Naiyan Wang (65 papers)
Zhaoxiang Zhang (162 papers)

Citations (198)

View on Semantic Scholar

Summary

The paper introduces RangeDet, which overcomes range view challenges by using RCP, Meta-Kernel Convolution, and weighted NMS to improve vehicle detection by about 20 3D AP.
It employs a Range Conditioned Pyramid to address scale variations, dynamically assigning detection tasks based on object distance in LiDAR data.
The study demonstrates that range view representation can achieve comparable performance to multi-view methods while offering a compact, error-minimized data format.

Overview of "RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection"

The paper proposes a novel approach to 3D object detection using LiDAR data, termed RangeDet, which is entirely based on range view representation. Unlike traditional voxelized or Bird's Eye View (BEV) representations, the range view offers a compact representation without quantization error—a critical advantage for improving detection accuracy, especially for distant objects. While the range view has been explored in semantic segmentation tasks, its full potential in 3D object detection, particularly compared to voxelized methods, has yet to be fully realized. RangeDet seeks to address this gap by resolving two primary issues that hindered prior range-view implementations: scale variation between nearby and distant objects and the inconsistency between 2D range image and 3D Cartesian coordinates.

Key Innovations

Range Conditioned Pyramid (RCP): The RCP addresses the scale variation challenge inherent in range views. It appropriately assigns detection tasks to different feature layers based on object range rather than 2D size alone, which aligns better with the perception of objects at variable distances in LiDAR data.
Meta-Kernel Convolution: To bridge the gap between the 2D convolution in feature extraction and 3D output space, the authors introduce Meta-Kernel Convolution. This method dynamically adapts convolutional kernels based on local geometry, allowing the convolutions to respect the geometric structure of the input space, significantly enhancing the network's capacity to glean meaningful information from range images.
Weighted Non-Maximum Suppression (WNMS): Capitalizing on the compactness of the range view, WNMS is used to de-duplicate detection proposals. This method effectively harnesses multiple proposals, leveraging more comprehensive vote-based averaging to correct detection variances, especially beneficial for processing full-resolution feature maps efficiently.
Data Augmentation in Range View: The work explores the adaptation of common data augmentation techniques, such as random flipping and rotation, into the range view framework. This transfer preserves the structural integrity and natural characteristics of range images, which is pivotal for maintaining the semantic content of the input data.

Results and Implications

RangeDet achieves significant performance gains, evidenced by a substantial improvement of approximately 20 3D AP in vehicle detection over previous range-view methods, bringing its performance on par with state-of-the-art multi-view-based methodologies. The results on the Waymo Open Dataset are particularly noteworthy, with performance metrics illustrating improved efficacy in detecting objects at various ranges, contradicting traditional assumptions about the limitations of range views for distant object detection.

Future Directions and Implications

The developments in RangeDet highlight the viability of range view representation as a standalone method for LiDAR-based 3D object detection. The absence of quantization errors and compact data representation provide a compelling argument for further exploration of range views in other complex LiDAR processing tasks. Future research could explore hybrid approaches, incorporating the strengths of each LiDAR representation to adapt to varying scenarios, enhance detection rates, and expand the range of detectable object scales.

Conclusion

By resolving inherent challenges in range-view representations through innovative components like Range Conditioned Pyramid and Meta-Kernel Convolution, RangeDet presents a robust defense of range views for LiDAR-based 3D object detection. This method paves the way for new research exploring how range views can be leveraged or integrated into multi-modal 3D detection frameworks, further enhancing autonomous vehicle perception systems' reliability and capacity.

PDF Markdown