Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Point-Based Approach to Efficient LiDAR Multi-Task Perception (2404.12798v1)

Published 19 Apr 2024 in cs.CV

Abstract: Multi-task networks can potentially improve performance and computational efficiency compared to single-task networks, facilitating online deployment. However, current multi-task architectures in point cloud perception combine multiple task-specific point cloud representations, each requiring a separate feature encoder and making the network structures bulky and slow. We propose PAttFormer, an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds that only relies on a point-based representation. The network builds on transformer-based feature encoders using neighborhood attention and grid-pooling and a query-based detection decoder using a novel 3D deformable-attention detection head design. Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for multiple task-specific point cloud representations, resulting in a network that is 3x smaller and 1.4x faster while achieving competitive performance on the nuScenes and KITTI benchmarks for autonomous driving perception. Our extensive evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP on the nuScenes benchmark compared to the single-task models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. D. Ye, Z. Zhou, W. Chen, Y. Xie, Y. Wang, P. Wang, and H. Foroosh, “Lidarmultinet: Towards a unified multi-task network for lidar perception,” in AAAI, vol. 37, no. 3, 2023, pp. 3231–3240.
  2. Z. Zhou, D. Ye, W. Chen, Y. Xie, Y. Wang, P. Wang, and H. Foroosh, “Lidarformer: A unified transformer-based multi-task network for lidar perception,” arXiv preprint arXiv:2303.12194, 2023.
  3. X. Zhang, C. Min, Y. Jia, L. Chen, J. Zhang, and H. Sun, “Boosting lidar 3d object detection with point cloud semantic segmentation,” in Int. Conf. on Intelligent Robots and Systems, 2023, pp. 7614–7621.
  4. N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “Continual slam: Beyond lifelong simultaneous localization and mapping through continual learning,” in Robotics Research, 2022, pp. 19–35.
  5. B. Bešić and A. Valada, “Dynamic object removal and spatio-temporal rgb-d inpainting via geometry-aware adversarial learning,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 2, pp. 170–185, 2022.
  6. N. Vödisch, K. Petek, W. Burgard, and A. Valada, “Codeps: Online continual learning for depth estimation and panoptic segmentation,” Robotics: Science and Systems, 2023.
  7. R. Mohan and A. Valada, “Perceiving the invisible: Proposal-free amodal panoptic segmentation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9302–9309, 2022.
  8. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 621–11 631.
  9. Y. Liao, J. Xie, and A. Geiger, “KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,” IEEE Trans. Pattern Anal. Mach. Intell., 2022.
  10. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” Adv. Neural Inform. Process. Syst., vol. 30, 2017.
  11. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Int. Conf. Comput. Vis., 2019, pp. 6411–6420.
  12. J. V. Hurtado and A. Valada, “Semantic scene segmentation for robotics,” in Deep learning for robot perception and cognition, 2022.
  13. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” IEEE Conf. Comput. Vis. Pattern Recog., June 2019.
  14. X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and D. Lin, “Cylindrical and asymmetrical 3d convolution networks for lidar segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
  15. Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, and H. Foroosh, “Polarnet: An improved grid representation for online lidar point clouds semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 9601–9610.
  16. T. Lu, X. Ding, H. Liu, G. Wu, and L. Wang, “Link: Linear kernel for lidar-based 3d perception,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 1105–1115.
  17. X. Lai, Y. Chen, F. Lu, J. Liu, and J. Jia, “Spherical transformer for lidar-based 3d recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 17 545–17 555.
  18. X. Wu, Y. Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,” Adv. Neural Inform. Process. Syst., vol. 35, pp. 33 330–33 342, 2022.
  19. A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 5240–5250.
  20. Z. Yang, Y. Sun, S. Liu, and J. Jia, “3dssd: Point-based 3d single stage object detector,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 040–11 048.
  21. S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in IEEE Conf. Comput. Vis. Pattern Recog., June 2019.
  22. T. Yin, X. Zhou, and P. Krähenbühl, “Center-based 3d object detection and tracking,” IEEE Conf. Comput. Vis. Pattern Recog., 2021.
  23. S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 10 529–10 538.
  24. P. Sun, M. Tan, W. Wang, C. Liu, F. Xia, Z. Leng, and D. Anguelov, “Swformer: Sparse window transformer for 3d object detection in point clouds,” in Eur. Conf. Comput. Vis., 2022, pp. 426–442.
  25. Z. Liu, X. Yang, H. Tang, S. Yang, and S. Han, “Flatformer: Flattened window attention for efficient point cloud transformer,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 1200–1211.
  26. Y. Chen, Z. Yu, Y. Chen, S. Lan, A. Anandkumar, J. Jia, and J. M. Alvarez, “Focalformer3d: focusing on hard instance for 3d object detection,” in Int. Conf. Comput. Vis., 2023, pp. 8394–8405.
  27. G. K. Erabati and H. Araujo, “Li3detr: A lidar based 3d detection transformer,” in IEEE/CVF Winter Conf. on Applications of Comput. Vis., 2023, pp. 4250–4259.
  28. D. Feng, Y. Zhou, C. Xu, M. Tomizuka, and W. Zhan, “A simple and efficient multi-task network for 3d object detection and road understanding,” in Int. Conf. on Intelligent Robots and Systems, 2021.
  29. C. He, R. Li, S. Li, and L. Zhang, “Voxel set transformer: A set-to-set approach to 3d object detection from point clouds,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 8417–8427.
  30. A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” IEEE Conf. Comput. Vis. Pattern Recog., pp. 7482–7491, 2018.
  31. W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou, H. Caesar, O. Beijbom, and A. Valada, “Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3795–3802, 2022.
  32. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in Int. Conf. Comput. Vis., 2019, pp. 9297–9307.
  33. M. Contributors, “MMDetection3D: OpenMMLab next-generation platform for general 3D object detection,” https://github.com/open-mmlab/mmdetection3d, 2020.
  34. L. Nunes, L. Wiesmann, R. Marcuzzi, X. Chen, J. Behley, and C. Stachniss, “Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023.
  35. B. Zhu, Z. Wang, S. Shi, H. Xu, L. Hong, and H. Li, “Conquer: Query contrast voxel-detr for 3d object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 9296–9305.
  36. B. Zhu, Z. Jiang, X. Zhou, Z. Li, and G. Yu, “Class-balanced grouping and sampling for point cloud 3d object detection,” arXiv preprint arXiv:1908.09492, 2019.
  37. L. Kong, Y. Liu, R. Chen, Y. Ma, X. Zhu, Y. Li, Y. Hou, Y. Qiao, and Z. Liu, “Rethinking range view representation for lidar segmentation,” arXiv preprint arXiv:2303.05367, 2023.
  38. Y. Hou, X. Zhu, Y. Ma, C. C. Loy, and Y. Li, “Point-to-voxel knowledge distillation for lidar semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 8479–8488.
  39. J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, and S. Pu, “Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033.
  40. I. Vizzo, T. Guadagnino, B. Mersch, L. Wiesmann, J. Behley, and C. Stachniss, “Kiss-icp: In defense of point-to-point icp–simple, accurate, and robust registration if done the right way,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 1029–1036, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Christopher Lang (7 papers)
  2. Alexander Braun (27 papers)
  3. Lars Schillingmann (4 papers)
  4. Abhinav Valada (116 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com