A Point-Based Approach to Efficient LiDAR Multi-Task Perception (2404.12798v1)
Abstract: Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds that only relies on a point-based representation. The network builds on transformer-based feature encoders using neighborhood attention and grid-pooling and a query-based detection decoder using a novel 3D deformable-attention detection head design. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for multiple task-specific point cloud representations, resulting in a network that is 3x smaller and 1.4x faster while achieving competitive performance on the nuScenes and KITTI benchmarks for autonomous driving perception. Our extensive evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP on the nuScenes benchmark compared to the single-task models.
- D. Ye, Z. Zhou, W. Chen, Y. Xie, Y. Wang, P. Wang, and H. Foroosh, “Lidarmultinet: Towards a unified multi-task network for lidar perception,” in AAAI, vol. 37, no. 3, 2023, pp. 3231–3240.
- Z. Zhou, D. Ye, W. Chen, Y. Xie, Y. Wang, P. Wang, and H. Foroosh, “Lidarformer: A unified transformer-based multi-task network for lidar perception,” arXiv preprint arXiv:2303.12194, 2023.
- X. Zhang, C. Min, Y. Jia, L. Chen, J. Zhang, and H. Sun, “Boosting lidar 3d object detection with point cloud semantic segmentation,” in Int. Conf. on Intelligent Robots and Systems, 2023, pp. 7614–7621.
- N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “Continual slam: Beyond lifelong simultaneous localization and mapping through continual learning,” in Robotics Research, 2022, pp. 19–35.
- B. Bešić and A. Valada, “Dynamic object removal and spatio-temporal rgb-d inpainting via geometry-aware adversarial learning,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 2, pp. 170–185, 2022.
- N. Vödisch, K. Petek, W. Burgard, and A. Valada, “Codeps: Online continual learning for depth estimation and panoptic segmentation,” Robotics: Science and Systems, 2023.
- R. Mohan and A. Valada, “Perceiving the invisible: Proposal-free amodal panoptic segmentation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9302–9309, 2022.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 621–11 631.
- Y. Liao, J. Xie, and A. Geiger, “KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,” IEEE Trans. Pattern Anal. Mach. Intell., 2022.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Adv. Neural Inform. Process. Syst., vol. 30, 2017.
- H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Int. Conf. Comput. Vis., 2019, pp. 6411–6420.
- J. V. Hurtado and A. Valada, “Semantic scene segmentation for robotics,” in Deep learning for robot perception and cognition, 2022.
- C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” IEEE Conf. Comput. Vis. Pattern Recog., June 2019.
- X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and D. Lin, “Cylindrical and asymmetrical 3d convolution networks for lidar segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, and H. Foroosh, “Polarnet: An improved grid representation for online lidar point clouds semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 9601–9610.
- T. Lu, X. Ding, H. Liu, G. Wu, and L. Wang, “Link: Linear kernel for lidar-based 3d perception,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 1105–1115.
- X. Lai, Y. Chen, F. Lu, J. Liu, and J. Jia, “Spherical transformer for lidar-based 3d recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 17 545–17 555.
- X. Wu, Y. Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,” Adv. Neural Inform. Process. Syst., vol. 35, pp. 33 330–33 342, 2022.
- A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 5240–5250.
- Z. Yang, Y. Sun, S. Liu, and J. Jia, “3dssd: Point-based 3d single stage object detector,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 040–11 048.
- S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in IEEE Conf. Comput. Vis. Pattern Recog., June 2019.
- T. Yin, X. Zhou, and P. Krähenbühl, “Center-based 3d object detection and tracking,” IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 10 529–10 538.
- P. Sun, M. Tan, W. Wang, C. Liu, F. Xia, Z. Leng, and D. Anguelov, “Swformer: Sparse window transformer for 3d object detection in point clouds,” in Eur. Conf. Comput. Vis., 2022, pp. 426–442.
- Z. Liu, X. Yang, H. Tang, S. Yang, and S. Han, “Flatformer: Flattened window attention for efficient point cloud transformer,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 1200–1211.
- Y. Chen, Z. Yu, Y. Chen, S. Lan, A. Anandkumar, J. Jia, and J. M. Alvarez, “Focalformer3d: focusing on hard instance for 3d object detection,” in Int. Conf. Comput. Vis., 2023, pp. 8394–8405.
- G. K. Erabati and H. Araujo, “Li3detr: A lidar based 3d detection transformer,” in IEEE/CVF Winter Conf. on Applications of Comput. Vis., 2023, pp. 4250–4259.
- D. Feng, Y. Zhou, C. Xu, M. Tomizuka, and W. Zhan, “A simple and efficient multi-task network for 3d object detection and road understanding,” in Int. Conf. on Intelligent Robots and Systems, 2021.
- C. He, R. Li, S. Li, and L. Zhang, “Voxel set transformer: A set-to-set approach to 3d object detection from point clouds,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 8417–8427.
- A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” IEEE Conf. Comput. Vis. Pattern Recog., pp. 7482–7491, 2018.
- W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou, H. Caesar, O. Beijbom, and A. Valada, “Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3795–3802, 2022.
- J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in Int. Conf. Comput. Vis., 2019, pp. 9297–9307.
- M. Contributors, “MMDetection3D: OpenMMLab next-generation platform for general 3D object detection,” https://github.com/open-mmlab/mmdetection3d, 2020.
- L. Nunes, L. Wiesmann, R. Marcuzzi, X. Chen, J. Behley, and C. Stachniss, “Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- B. Zhu, Z. Wang, S. Shi, H. Xu, L. Hong, and H. Li, “Conquer: Query contrast voxel-detr for 3d object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 9296–9305.
- B. Zhu, Z. Jiang, X. Zhou, Z. Li, and G. Yu, “Class-balanced grouping and sampling for point cloud 3d object detection,” arXiv preprint arXiv:1908.09492, 2019.
- L. Kong, Y. Liu, R. Chen, Y. Ma, X. Zhu, Y. Li, Y. Hou, Y. Qiao, and Z. Liu, “Rethinking range view representation for lidar segmentation,” arXiv preprint arXiv:2303.05367, 2023.
- Y. Hou, X. Zhu, Y. Ma, C. C. Loy, and Y. Li, “Point-to-voxel knowledge distillation for lidar semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 8479–8488.
- J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, and S. Pu, “Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033.
- I. Vizzo, T. Guadagnino, B. Mersch, L. Wiesmann, J. Behley, and C. Stachniss, “Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 1029–1036, 2023.
- Christopher Lang (7 papers)
- Alexander Braun (27 papers)
- Lars Schillingmann (4 papers)
- Abhinav Valada (117 papers)