PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection (1912.13192v2)

Published 31 Dec 2019 in cs.CV, cs.LG, and eess.IV

Abstract: We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the high-quality 3D proposals generated by the voxel CNN, the RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction with multiple receptive fields. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins by using only point clouds. Code is available at https://github.com/open-mmlab/OpenPCDet.

Authors (7)

Shaoshuai Shi (39 papers)
Chaoxu Guo (8 papers)
Li Jiang (88 papers)
Zhe Wang (574 papers)
Jianping Shi (76 papers)
Xiaogang Wang (230 papers)
Hongsheng Li (340 papers)

Citations (1,577)

View on Semantic Scholar

Summary

The paper introduces a novel two-step feature abstraction that combines voxel CNNs and PointNet set abstraction to enhance 3D detection accuracy.
It leverages a voxel-to-keypoint module and a unique RoI grid pooling strategy to preserve fine-grained localization and contextual semantics.
Experimental results on KITTI and Waymo datasets show significant improvements, with mAP gains up to 7.37% in challenging detection scenarios.

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

In the domain of 3D object detection, the paper "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection" presents a novel approach that significantly enhances detection accuracy by leveraging both voxel-based Convolutional Neural Networks (CNN) and PointNet-based set abstraction. The authors propose a two-step framework, which enables efficient learning of discriminative features from point clouds, overcoming the limitations of either method when used in isolation.

The primary contribution of PV-RCNN lies in its architecture that integrates voxel and point-based feature learning strategies through two critical modules: Voxel-to-Keypoint Scene Encoding and Keypoint-to-Grid RoI Feature Abstraction. This allows the model to preserve fine-grained localization information while providing a broader contextual understanding necessary for accurate object detection.

Technical Overview

The architecture begins with a 3D voxel CNN with sparse convolutions for efficient feature learning and 3D proposal generation. This setup, referred to as the region proposal network (RPN), efficiently encodes multi-scale feature representations into voxel-wise features but suffers from sparse feature volumes that hinder accurate region pooling.

To address this, the proposed voxel-to-keypoint encoding uses the furtherest point sampling (FPS) algorithm to select a small, representative set of keypoints from the scene. This is followed by the Voxel Set Abstraction (VSA) module, which aggregates multi-scale semantic features from voxel-wise features to these keypoints using PointNet-based operations.

In the second stage, the Keypoint-to-Grid RoI Feature Abstraction module pools these keypoint features into specific proposal regions of interest (RoI). This pooling is performed using a novel RoI-grid pooling strategy, which captures rich contextual information with multiple receptive fields, thereby enhancing the accuracy of object confidence predictions and location refinements.

Experimental Results

PV-RCNN demonstrates superior performance across benchmark datasets:

KITTI dataset: The model achieves an impressive mean Average Precision (mAP) of 90.25% (easy), 81.43% (moderate), and 76.82% (hard) for 3D object detection of cars. The results show marked improvements over prior state-of-the-art methods by margins of up to 1.73%.
Waymo Open Dataset: PV-RCNN achieves noteworthy mAP and mAPH values, with significant gains of 7.37% mAP at LEVEL 1 and outperforming metrics for various distance ranges (0-30m, 30-50m, 50m-inf). The method generalizes well in large-scale settings, demonstrating its robustness and efficacy.

Implications and Future Directions

The PV-RCNN framework's integration of voxel-based and PointNet-based methods addresses a key challenge in 3D object detection: efficiently encoding and accurately pooling features from sparse and irregular point clouds. This dual approach captures both localized and contextual information, setting a new standard in the field.

Practical implications of this research are profound, particularly in autonomous driving and robotics, where accurate 3D understanding is critical. Future work could explore:

Real-time Performance and Efficiency: Optimizing PV-RCNN for real-time applications without compromising accuracy could make it suitable for deployment in autonomous vehicles.
Enhanced Segmentation and Object Tracking: Integrating improved segmentation techniques and object tracking methodologies could further enhance detection reliability.
Cross-Domain Adaptation: Adapting the model for various environmental conditions and diverse datasets would ensure broader applicability and robustness.

Conclusion

PV-RCNN offers a comprehensive approach to 3D object detection, leveraging the strengths of voxel-based and point-based neural networks. Its robust performance across benchmark datasets underscores its potential as a leading solution in the field, with significant applications in autonomous driving and beyond. The framework's innovative methodology sets a strong foundation for future enhancements and real-world deployments in AI-driven perception systems.

PDF Markdown

Related Papers

GitHub

GitHub - open-mmlab/OpenPCDet: OpenPCDet Toolbox for LiDAR-based 3D Object Detection. (4,404 stars)

YouTube

Show All Videos