Small, Versatile and Mighty: A Range-View Perception Framework (2403.00325v1)
Abstract: Despite its compactness and information integrity, the range view representation of LiDAR data rarely occurs as the first choice for 3D perception tasks. In this work, we further push the envelop of the range-view representation with a novel multi-task framework, achieving unprecedented 3D detection performances. Our proposed Small, Versatile, and Mighty (SVM) network utilizes a pure convolutional architecture to fully unleash the efficiency and multi-tasking potentials of the range view representation. To boost detection performances, we first propose a range-view specific Perspective Centric Label Assignment (PCLA) strategy, and a novel View Adaptive Regression (VAR) module to further refine hard-to-predict box properties. In addition, our framework seamlessly integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud, without extra modules. Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Waymo Open Dataset. Especially, over 10 mAP improvement over convolutional counterparts can be obtained on the vehicle class. Our presented results for other tasks further reveal the multi-task capabilities of the proposed small but mighty framework.
- Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5240–5250, 2023.
- Semantickitti: A dataset for semantic scene understanding of lidar sequences. In International Conference on Computer Vision, pages 9297–9307, 2019.
- Range conditioned dilated convolutions for scale invariant 3d object detection. arXiv preprint arXiv:2005.09927, 2020.
- To the point: Efficient 3d object detection in the range image with graph convolution kernels. In IEEE Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2021.
- Fast point r-cnn. In International Conference on Computer Vision, pages 9775–9784, 2019.
- Cenet: Toward concise and efficient lidar semantic segmentation for autonomous driving. In 2022 IEEE International Conference on Multimedia and Expo (ICME), pages 01–06. IEEE, 2022.
- Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds. In Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part II 15, pages 207–222. Springer, 2020.
- Voxel r-cnn: Towards high performance voxel-based 3d object detection. In AAAI Conference on Artificial Intelligence, pages 1201–1209, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Centernet: Keypoint triplets for object detection. In International Conference on Computer Vision, pages 6569–6578, 2019.
- Rangedet: In defense of range view for lidar-based 3d object detection. In International Conference on Computer Vision, pages 2918–2927, 2021.
- Embracing single stride 3d object detector with sparse transformer. In IEEE Conference on Computer Vision and Pattern Recognition, pages 8458–8468, 2022.
- Kprnet: Improving projection-based lidar semantic segmentation. arXiv preprint arXiv:2007.12668, 2020.
- Rethinking range view representation for lidar segmentation. In International Conference on Computer Vision, pages 228–240, 2023.
- Pointpillars: Fast encoders for object detection from point clouds. In IEEE Conference on Computer Vision and Pattern Recognition, pages 12697–12705, 2019.
- Pillarnext: Rethinking network designs for 3d object detection in lidar point clouds. In IEEE Conference on Computer Vision and Pattern Recognition, pages 17567–17576, 2023.
- Lidar r-cnn: An efficient and universal 3d object detector. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7546–7555, 2021.
- Focal loss for dense object detection. In International Conference on Computer Vision, pages 2980–2988, 2017.
- Lasernet: An efficient probabilistic 3d object detector for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition, pages 12677–12686, 2019.
- Rangenet++: Fast and accurate lidar semantic segmentation. In IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4213–4220. IEEE, 2019.
- Lidar panoptic segmentation for autonomous driving. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8505–8512. IEEE, 2020.
- Frustum pointnets for 3d object detection from rgb-d data. In IEEE Conference on Computer Vision and Pattern Recognition, pages 918–927, 2018.
- Deep hough voting for 3d object detection in point clouds. In International Conference on Computer Vision, pages 9277–9286, 2019.
- Bevcontrast: Self-supervision in bev space for automotive lidar point clouds. arXiv preprint arXiv:2310.17281, 2023.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–779, 2019a.
- Part-aˆ 2 net: 3d part-aware and aggregation neural network for object detection from point cloud. arXiv preprint arXiv:1907.03670, 2(3), 2019b.
- Point-gnn: Graph neural network for 3d object detection in a point cloud. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1711–1719, 2020.
- Efficientlps: Efficient lidar panoptic segmentation. IEEE Transactions on Robotics, 38(3):1894–1914, 2021.
- Scalability in perception for autonomous driving: Waymo open dataset. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2446–2454, 2020.
- Rsn: Range sparse net for efficient, accurate lidar 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5725–5734, 2021.
- Fcos: Fully convolutional one-stage object detection. In International Conference on Computer Vision, pages 9627–9636, 2019.
- Fully convolutional one-stage 3d object detection on lidar range images. Annual Conference on Neural Information Processing Systems, 35:34899–34911, 2022.
- Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In European Conference on Computer Vision, pages 1–19. Springer, 2020.
- Aop-net: All-in-one perception network for lidar-based joint 3d object detection and panoptic segmentation. In IEEE Intelligent Vehicles Symposium (IV), pages 1–7. IEEE, 2023.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
- Ipod: Intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276, 2018.
- 3dssd: Point-based 3d single stage object detector. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11040–11048, 2020.
- Lidarmultinet: Towards a unified multi-task network for lidar perception. In AAAI Conference on Artificial Intelligence, pages 3231–3240, 2023.
- Center-based 3d object detection and tracking. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11784–11793, 2021.
- Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9759–9768, 2020a.
- Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9601–9610, 2020b.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4490–4499, 2018.
- Centerformer: Center-based transformer for 3d object detection. In European Conference on Computer Vision, pages 496–513. Springer, 2022.
- Lidarformer: A unified transformer-based multi-task network for lidar perception. arXiv preprint arXiv:2303.12194, 2023.
- Curricular object manipulation in lidar-based object detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1125–1135, 2023.