PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection (2102.00463v3)

Published 31 Jan 2021 in cs.CV

Abstract: 3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about $3\times$ faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of 150m * 150m.

Citations (343)

View on Semantic Scholar

Summary

The paper introduces a novel integration of voxel CNN and point-based features through unique keypoint encoding and grid feature abstraction.
It employs sectorized proposal-centric sampling and VectorPool aggregation to reduce computation while enhancing detection accuracy.
State-of-the-art results on the Waymo Open Dataset validate its efficiency and performance for large-scale autonomous driving applications.

Overview of PV-RCNN++

The paper presents PV-RCNN++ as an advanced framework for 3D object detection in point cloud data, which builds upon the original PV-RCNN architecture. The focus of PV-RCNN++ is on improving efficiency and performance by deeply integrating point-based and voxel-based feature learning strategies. These enhancements are achieved through novel sampling strategies and feature aggregation techniques.

Key Innovations

Point-Voxel Integration: PV-RCNN combines the advantages of voxel-based CNNs, which efficiently encode multi-scale features, and point-based methods, which preserve accurate spatial information. The voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction are pivotal in marrying these approaches.
PV-RCNN Framework:
- Voxel Set Abstraction: Integrates 3D voxel CNN features with raw point features into a concise representation.
- RoI-Grid Pooling: Aggregates multi-scale grid features for refined proposal prediction, surpassing prior methodologies by incorporating external contextual information.
PV-RCNN++ Enhancements:
- Sectorized Proposal-Centric Sampling: Increases keypoint efficiency by focusing on regions around proposals, significantly reducing computation while maintaining performance.
- VectorPool Aggregation: Encodes local geometry robustly and efficiently, using position-sensitive features to conserve resources and enhance representation.

Numerical Results

PV-RCNN++ achieves state-of-the-art results on the Waymo Open Dataset, attaining 10 FPS processing speed over a $150m \times 150m$ range, outperforming predecessors by substantial margins in mAPH across different categories. This advancement demonstrates its practical validity in large-scale applications, particularly autonomous driving.

Implications and Future Directions

The integration of the point-voxel feature learning paradigm holds promise for further enhancing 3D object detection algorithms. The novel sampling and aggregation techniques introduced by PV-RCNN++ might inspire future research in optimizing resource consumption while maximizing the efficacy of detection frameworks. Future exploration could focus on expanding these strategies to accommodate even larger datasets and more complex environments, potentially incorporating real-time applications in autonomous navigation systems.

The PV-RCNN++ framework's demonstrated ability to balance computational efficiency and detection accuracy represents a significant progression in the field of 3D object detection, and it sets a solid benchmark for subsequent advancements. Additionally, the application of these methodologies to other domains involving point cloud data could offer new avenues for research and innovation.

PDF Markdown

Related Papers

GitHub

GitHub - open-mmlab/OpenPCDet: OpenPCDet Toolbox for LiDAR-based 3D Object Detection. (5,093 stars)