FastPillars: A Deployment-friendly Pillar-based 3D Detector (2302.02367v6)
Abstract: The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios. Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference, which puts a hard barrier for deployment, especially for on-device applications. In this paper, to tackle the challenge of efficient 3D object detection from an industry perspective, we devise a deployment-friendly pillar-based 3D detector, termed FastPillars. First, we introduce a novel lightweight Max-and-Attention Pillar Encoding (MAPE) module specially for enhancing small 3D objects. Second, we propose a simple yet effective principle for designing a backbone in pillar-based 3D detection. We construct FastPillars based on these designs, achieving high performance and low latency without SPConv. Extensive experiments on two large-scale datasets demonstrate the effectiveness and efficiency of FastPillars for on-device 3D detection regarding both performance and speed. Specifically, FastPillars delivers state-of-the-art accuracy on Waymo Open Dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based). Our code is publicly available at: https://github.com/StiphyJay/FastPillars.
- Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection. In Conference on Robot Learning (CoRL).
- Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.
- Nuscenes: A multimodal dataset for autonomous driving. 11621–11631.
- Multi-view 3D object detection network for autonomous driving. In CVPR, 1907–1915.
- Focal Sparse Convolutional Networks for 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
- Contributors, S. 2022. Spconv: Spatially Sparse Convolution Library.
- Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection. In AAAI.
- VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8448–8457.
- Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13733–13742.
- Embracing single stride 3d object detector with sparse transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8458–8468.
- Fully sparse 3d object detection. Advances in Neural Information Processing Systems, 35: 351–363.
- A versatile multi-view framework for lidar-based 3d object detection with guidance from panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17192–17201.
- Glenn, J. 2022. YOLOv5 release v6.1. https://github.com/ultralytics/yolov5/releases/tag/v6.1.
- Submanifold sparse convolutional networks. arXiv preprint arXiv:1706.01307.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Afdetv2: Rethinking the necessity of the second stage for object detection from point clouds. 36(1): 969–979.
- Voxel-FPN: Multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds. Sensors, 20(3): 704.
- Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 12697–12705.
- YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976.
- PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Unifying Voxel-based Representation with Transformer for 3D Object Detection. In Advances in Neural Information Processing Systems.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980–2988.
- Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection. In ICCV, 2723–2732.
- Swformer: Sparse window transformer for 3d object detection in point clouds. In ECCV.
- Deep hough voting for 3D object detection in point clouds. In ICCV, 9277–9286.
- Frustum pointnets for 3D object detection from rgb-d data. In CVPR, 918–927.
- Pointnet: Deep learning on point sets for 3D classification and segmentation. In CVPR, 652–660.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 5099–5108.
- PillarNet: Real-Time and High-Performance Pillar-based 3D Object Detection.
- Pv-rcnn: Point-voxel feature set abstraction for 3D object detection. In CVPR, 10529–10538.
- PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. IJCV.
- Pointrcnn: 3D object proposal generation and detection from point cloud. In CVPR, 770–779.
- From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE TPAMI.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, volume 11006, 369–386. SPIE.
- Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2446–2454.
- RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection. In CVPR, 5725–5734.
- Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images. NeurIPS.
- Pointaugmenting: Cross-modal augmentation for 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11794–11803.
- YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696.
- PP-YOLOE: An evolved version of YOLO. arXiv preprint arXiv:2203.16250.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10): 3337.
- 3dssd: Point-based 3D single stage object detector. In CVPR, 11040–11048.
- 3D-MAN: 3D Multi-frame Attention Network for Object Detection. In CVPR, 1863–1872.
- Center-based 3d object detection and tracking. 11784–11793.
- CenterPoint++ Submission to the Waymo Real-time 3D Detection Challenge. Accessed: 2021-12-05.
- Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 12993–13000.
- Voxelnet: End-to-end learning for point cloud based 3D object detection. In CVPR, 4490–4499.
- Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection. arXiv preprint arXiv:1908.09492.