Fully Sparse 3D Object Detection (2207.10035v2)

Published 20 Jul 2022 in cs.CV and cs.RO

Abstract: As the perception range of LiDAR increases, LiDAR-based 3D object detection becomes a dominant task in the long-range perception task of autonomous driving. The mainstream 3D object detectors usually build dense feature maps in the network backbone and prediction head. However, the computational and spatial costs on the dense feature map are quadratic to the perception range, which makes them hardly scale up to the long-range setting. To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD). The computational and spatial cost of FSD is roughly linear to the number of points and independent of the perception range. FSD is built upon the general sparse voxel encoder and a novel sparse instance recognition (SIR) module. SIR first groups the points into instances and then applies instance-wise feature extraction and prediction. In this way, SIR resolves the issue of center feature missing, which hinders the design of the fully sparse architecture for all center-based or anchor-based detectors. Moreover, SIR avoids the time-consuming neighbor queries in previous point-based methods by grouping points into instances. We conduct extensive experiments on the large-scale Waymo Open Dataset to reveal the working mechanism of FSD, and state-of-the-art performance is reported. To demonstrate the superiority of FSD in long-range detection, we also conduct experiments on Argoverse 2 Dataset, which has a much larger perception range ($200m$) than Waymo Open Dataset ($75m$). On such a large perception range, FSD achieves state-of-the-art performance and is 2.4$\times$ faster than the dense counterpart. Codes will be released at https://github.com/TuSimple/SST.

Citations (74)

View on Semantic Scholar

Summary

The paper introduces a Fully Sparse Detector (FSD) that uses a sparse voxel encoder and Sparse Instance Recognition to efficiently detect 3D objects in LiDAR data.
It achieves linear computational complexity relative to the number of points, overcoming the quadratic costs associated with dense feature maps.
FSD demonstrates state-of-the-art performance on the Waymo Open and Argoverse 2 datasets, crucially improving speed and accuracy in autonomous driving applications.

Fully Sparse 3D Object Detection

The paper "Fully Sparse 3D Object Detection" presents an efficient approach to long-range LiDAR-based 3D object detection, aiming to address the computational limitations of traditional dense feature map-based detectors. The authors introduce a novel detector, termed the Fully Sparse Detector (FSD), which leverages sparse voxel encoding and a unique Sparse Instance Recognition (SIR) module to achieve linear computational and spatial costs relative to the number of LiDAR points, independent of the perception range.

Key Contributions

The primary contribution of this paper is the development of the FSD, which operates efficiently even at extended ranges typical in autonomous driving applications. The traditional methods are limited by the quadratic increase in computational complexity with the perception range due to dense feature maps. FSD addresses this limitation through:

Sparse Voxel Encoder and SIR Module: By combining a sparse voxel encoder with the SIR module, FSD eschews dense feature maps entirely. This approach circumvents the issue of "Center Feature Missing" (CFM) and avoids the computational overhead of dense methods.
Instance-Level Prediction: The SIR module achieves instance-wise feature extraction without the need for computationally intensive neighborhood queries. Unlike previous point-based methods, SIR associates points into instances efficiently, mitigating the need for downsampling and the associated information loss.
State-of-the-Art Performance: FSD demonstrates state-of-the-art results on both the Waymo Open Dataset and the Argoverse 2 Dataset, achieving superior speed and accuracy compared to dense counterparts.

Numerical Results and Implications

The experimental evaluation on the Waymo Open Dataset shows that FSD attains competitive performance across different classes such as vehicles, pedestrians, and cyclists. Notably, on the Argoverse 2 Dataset, with a perception range of 200 meters, FSD achieves leading precision scores while being 2.4 times faster than dense competitors. This positions FSD as an effective solution for real-time applications in autonomous driving, where computational efficiency is critical.

Theoretical and Practical Implications

The transition to a fully sparse architecture in FSD can be viewed as a significant advancement in the design of scalable 3D detection systems. This approach is not only computationally and spatially efficient but also theoretically intriguing as it opens avenues for further exploration in efficient instance-level recognition using sparse point cloud data.

Practical implications of this work include improved scalability of autonomous systems tasked with long-range perception, which is particularly beneficial for high-speed scenarios where timely processing of sensory data is crucial.

Future Directions

The paper lays a foundation for further research into more sophisticated sparse processing techniques that can leverage the inherent sparsity of LiDAR data. Future developments may focus on refining instance grouping strategies and exploring integration with other sensory modalities for multimodal perception.

In summary, "Fully Sparse 3D Object Detection" advances the field by presenting a novel approach that reconciles the needs for efficiency and precision in LiDAR-based object detection, establishing a promising direction for future research in long-range perception tasks.

PDF Markdown

Related Papers

GitHub

GitHub - tusen-ai/SST: Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral). (788 stars)