Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans (2403.16318v2)

Published 24 Mar 2024 in cs.CV

Abstract: Recently, progress in acquisition equipment such as LiDAR sensors has enabled sensing increasingly spacious outdoor 3D environments. Making sense of such 3D acquisitions requires fine-grained scene understanding, such as constructing instance-based 3D scene segmentations. Commonly, a neural network is trained for this task; however, this requires access to a large, densely annotated dataset, which is widely known to be challenging to obtain. To address this issue, in this work we propose to predict instance segmentations for 3D scenes in an unsupervised way, without relying on ground-truth annotations. To this end, we construct a learning framework consisting of two components: (1) a pseudo-annotation scheme for generating initial unsupervised pseudo-labels; and (2) a self-training algorithm for instance segmentation to fit robust, accurate instances from initial noisy proposals. To enable generating 3D instance mask proposals, we construct a weighted proxy-graph by connecting 3D points with edges integrating multi-modal image- and point-based self-supervised features, and perform graph-cuts to isolate individual pseudo-instances. We then build on a state-of-the-art point-based architecture and train a 3D instance segmentation model, resulting in significant refinement of initial proposals. To scale to arbitrary complexity 3D scenes, we design our algorithm to operate on local 3D point chunks and construct a merging step to generate scene-level instance segmentations. Experiments on the challenging SemanticKITTI benchmark demonstrate the potential of our approach, where it attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline. The code will be made publicly available at https://github.com/artonson/autoinst.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. A. Milioto, J. Behley, C. McCool, and C. Stachniss, “Lidar panoptic segmentation for autonomous driving,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 8505–8512.
  2. J. Behley, A. Milioto, and C. Stachniss, “A benchmark for lidar-based panoptic segmentation based on kitti,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 13 596–13 603.
  3. R. Marcuzzi, L. Nunes, L. Wiesmann, J. Behley, and C. Stachniss, “Mask-based panoptic lidar segmentation for autonomous driving,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 1141–1148, 2023.
  4. J. Li, X. He, Y. Wen, Y. Gao, X. Cheng, and D. Zhang, “Panoptic-phnet: Towards real-time and high-precision lidar panoptic segmentation via clustering pseudo heatmap,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 809–11 818.
  5. E. Li, R. Razani, Y. Xu, and B. Liu, “Cpseg: Cluster-free panoptic segmentation of 3d lidar point clouds,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 8239–8245.
  6. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9297–9307.
  7. X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The apolloscape open dataset for autonomous driving and its application,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2702–2719, 2019.
  8. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017.
  9. C. Yeshwanth, Y.-C. Liu, M. Nießner, and A. Dai, “Scannet++: A high-fidelity dataset of 3d indoor scenes,” in Proceedings of the International Conference on Computer Vision (ICCV), 2023.
  10. S. Peng, K. Genova, C. Jiang, A. Tagliasacchi, M. Pollefeys, T. Funkhouser, et al., “Openscene: 3d scene understanding with open vocabularies,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 815–824.
  11. A. Takmaz, E. Fedele, R. W. Sumner, M. Pollefeys, F. Tombari, and F. Engelmann, “Openmask3d: Open-vocabulary 3d instance segmentation,” arXiv preprint arXiv:2306.13631, 2023.
  12. T. Kontogianni, E. Celikkan, S. Tang, and K. Schindler, “Interactive object segmentation in 3d point clouds,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 2891–2897.
  13. Y. Yue, S. Mahadevan, J. Schult, F. Engelmann, B. Leibe, K. Schindler, and T. Kontogianni, “Agile3d: Attention guided interactive multi-object 3d segmentation,” arXiv preprint arXiv:2306.00977, 2023.
  14. R. Huang, S. Peng, A. Takmaz, F. Tombari, M. Pollefeys, S. Song, G. Huang, and F. Engelmann, “Segment3d: Learning fine-grained class-agnostic 3d segmentation without manual labels,” arXiv preprint arXiv:2312.17232, 2023.
  15. B. Bešić, N. Gosala, D. Cattaneo, and A. Valada, “Unsupervised domain adaptation for lidar panoptic segmentation,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3404–3411, 2022.
  16. O. Unal, D. Dai, and L. Van Gool, “Scribble-supervised lidar semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2697–2707.
  17. H. Shi, J. Wei, R. Li, F. Liu, and G. Lin, “Weakly supervised segmentation on outdoor 4d point clouds with temporal matching and spatial graph propagation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 840–11 849.
  18. H. Guo, H. Zhu, S. Peng, Y. Wang, Y. Shen, R. Hu, and X. Zhou, “Sam-guided graph cut for 3d instance segmentation,” arXiv preprint arXiv:2312.08372, 2023.
  19. Y. Yang, X. Wu, T. He, H. Zhao, and X. Liu, “Sam3d: Segment anything in 3d scenes,” arXiv preprint arXiv:2306.03908, 2023.
  20. X. Peng, R. Chen, F. Qiao, L. Kong, Y. Liu, T. Wang, X. Zhu, and Y. Ma, “Learning to adapt sam for segmenting cross-domain point clouds,” 2023.
  21. Y. Liu, L. Kong, J. Cen, R. Chen, W. Zhang, L. Pan, K. Chen, and Z. Liu, “Segment any point cloud sequences by distilling vision foundation models,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  22. L. Nunes, X. Chen, R. Marcuzzi, A. Osep, L. Leal-Taixé, C. Stachniss, and J. Behley, “Unsupervised class-agnostic instance segmentation of 3d lidar data for autonomous vehicles,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8713–8720, 2022.
  23. W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou, H. Caesar, O. Beijbom, and A. Valada, “Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3795–3802, 2022.
  24. X. Wang, R. Girdhar, S. X. Yu, and I. Misra, “Cut and learn for unsupervised object detection and instance segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3124–3134.
  25. D. Rozenberszki, O. Litany, and A. Dai, “Unscene3d: Unsupervised 3d instance segmentation for indoor scenes,” arXiv preprint arXiv:2303.14541, 2023.
  26. Z. Zhang, J. Ding, L. Jiang, D. Dai, and G.-S. Xia, “Freepoint: Unsupervised point cloud instance segmentation,” arXiv preprint arXiv:2305.06973, 2023.
  27. L. Nunes, L. Wiesmann, R. Marcuzzi, X. Chen, J. Behley, and C. Stachniss, “Temporal consistent 3d lidar representation learning for semantic perception in autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5217–5228.
  28. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
  29. J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 888–905, 2000.
  30. L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.” J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
  31. F. Zhang, C. Guan, J. Fang, S. Bai, R. Yang, P. H. Torr, and V. Prisacariu, “Instance segmentation of lidar point clouds,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 9448–9455.
  32. Y. Zhao, X. Zhang, and X. Huang, “A divide-and-merge point cloud clustering algorithm for lidar panoptic segmentation,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 7029–7035.
  33. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3075–3084.
  34. X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and D. Lin, “Cylindrical and asymmetrical 3d convolution networks for lidar segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 9939–9948.
  35. K. Sirohi, R. Mohan, D. Büscher, W. Burgard, and A. Valada, “Efficientlps: Efficient lidar panoptic segmentation,” IEEE Transactions on Robotics, vol. 38, no. 3, pp. 1894–1914, 2021.
  36. S. Gasperini, M.-A. N. Mahani, A. Marcos-Ramiro, N. Navab, and F. Tombari, “Panoster: End-to-end panoptic segmentation of lidar point clouds,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3216–3223, 2021.
  37. F. Hong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “Lidar-based panoptic segmentation via dynamic shifting network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 090–13 099.
  38. Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 194–13 203.
  39. X. Lai, Y. Chen, F. Lu, J. Liu, and J. Jia, “Spherical transformer for lidar-based 3d recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 545–17 555.
  40. X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He, and H. Zhao, “Point transformer v3: Simpler, faster, stronger,” arXiv preprint arXiv:2312.10035, 2023.
  41. A. Milioto, L. Mandtler, and C. Stachniss, “Fast instance and semantic segmentation exploiting local connectivity, metric learning, and one-shot detection for robotics,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 5481–5487.
  42. G. Narita, T. Seno, T. Ishikawa, and Y. Kaji, “Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,” 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4205–4212, 2019.
  43. S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, “Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7515–7525.
  44. M. Kolodiazhnyi, A. Vorontsova, A. Konushin, and D. Rukhovich, “Oneformer3d: One transformer for unified point cloud segmentation,” arXiv preprint arXiv:2311.14405, 2023.
  45. S. Chen, J. Fang, Q. Zhang, W. Liu, and X. Wang, “Hierarchical aggregation for 3d instance segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 467–15 476.
  46. T. Vu, K. Kim, T. M. Luu, T. Nguyen, and C. D. Yoo, “Softgroup for 3d instance segmentation on point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2708–2717.
  47. J. Sun, C. Qing, J. Tan, and X. Xu, “Superpoint transformer for 3d scene instance segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2393–2401.
  48. J. Schult, F. Engelmann, A. Hermans, O. Litany, S. Tang, and B. Leibe, “Mask3d: Mask transformer for 3d semantic instance segmentation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 8216–8223.
  49. J. Lu, J. Deng, C. Wang, J. He, and T. Zhang, “Query refinement transformer for 3d instance segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 516–18 526.
  50. Z. Yang and C. Liu, “Tupper-map: Temporal and unified panoptic perception for 3d metric-semantic mapping,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1094–1101.
  51. L. Nunes, R. Marcuzzi, X. Chen, J. Behley, and C. Stachniss, “Segcontrast: 3d point cloud feature representation learning through self-supervised segment discrimination,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2116–2123, 2022.
  52. L. Jiang, S. Shi, Z. Tian, X. Lai, S. Liu, C.-W. Fu, and J. Jia, “Guided point contrastive learning for semi-supervised point cloud semantic segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6423–6432.
  53. X. Xu and G. H. Lee, “Weakly supervised semantic point cloud segmentation: Towards 10x fewer labels,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 706–13 715.
  54. Y. Zhang, Y. Qu, Y. Xie, Z. Li, S. Zheng, and C. Li, “Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15 520–15 528.
  55. J. Chibane, F. Engelmann, T. Anh Tran, and G. Pons-Moll, “Box2mask: Weakly supervised 3d semantic instance segmentation using bounding boxes,” in European Conference on Computer Vision.   Springer, 2022, pp. 681–699.
  56. K. Klasing, D. Wollherr, and M. Buss, “A clustering method for efficient segmentation of 3d laser data,” in 2008 IEEE international conference on robotics and automation.   IEEE, 2008, pp. 4043–4048.
  57. M. Himmelsbach, F. V. Hundelshausen, and H.-J. Wuensche, “Fast segmentation of 3d point clouds for ground vehicles,” in 2010 IEEE Intelligent Vehicles Symposium.   IEEE, 2010, pp. 560–565.
  58. D. Korchev, S. Cheng, Y. Owechko, et al., “On real-time lidar data segmentation and classification,” in Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV).   The Steering Committee of The World Congress in Computer Science, Computer …, 2013, p. 1.
  59. I. Bogoslavskyi and C. Stachniss, “Fast range image-based segmentation of sparse 3d laser scans for online operation,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2016, pp. 163–169.
  60. D. Zermas, I. Izzat, and N. Papanikolopoulos, “Fast segmentation of 3d point clouds: A paradigm on lidar data for autonomous vehicle applications,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 5067–5073.
  61. B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton, and A. Frenkel, “On the segmentation of 3d lidar point clouds,” in 2011 IEEE International Conference on Robotics and Automation.   IEEE, 2011, pp. 2798–2805.
  62. J. Behley, V. Steinhage, and A. B. Cremers, “Laser-based segment classification using a mixture of bag-of-words,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2013, pp. 4195–4200.
  63. S. Park, S. Wang, H. Lim, and U. Kang, “Curved-voxel clustering for accurate segmentation of 3d lidar point clouds with real-time performance,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2019, pp. 6459–6464.
  64. A. Paigwar, Ö. Erkent, D. Sierra-Gonzalez, and C. Laugier, “Gndnet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 2150–2156.
  65. S. Lee, H. Lim, and H. Myung, “Patchwork++: Fast and robust ground segmentation solving partial under-segmentation using 3d point cloud,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 13 276–13 283.
  66. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  67. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
  68. J. Behley and C. Stachniss, “Efficient surfel-based slam using 3d laser range data in urban environments,” in Proc. of Robotics: Science and Systems (RSS), 2018.
  69. L. Li, K. N. Ismail, H. P. Shum, and T. P. Breckon, “Durlar: A high-fidelity 128-channel lidar dataset with panoramic ambient and reflectivity imagery for multi-modal autonomous driving applications,” in 2021 International Conference on 3D Vision (3DV).   IEEE, 2021, pp. 1227–1237.
  70. X. Chen, S. Li, B. Mersch, L. Wiesmann, J. Gall, J. Behley, and C. Stachniss, “Moving object segmentation in 3d lidar data: A learning-based approach exploiting sequential data,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6529–6536, 2021.
  71. S. Katz, A. Tal, and R. Basri, “Direct visibility of point sets,” in ACM SIGGRAPH 2007 papers, 2007, pp. 24–es.
  72. N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “Anyloc: Towards universal visual place recognition,” IEEE Robotics and Automation Letters, 2023.
  73. X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, “Dense contrastive learning for self-supervised visual pre-training,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 3024–3033.
  74. L. Zhang, A. J. Yang, Y. Xiong, S. Casas, B. Yang, M. Ren, and R. Urtasun, “Towards unsupervised object detection from lidar point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9317–9328.
  75. D. Arpit, S. Jastrzkebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, et al., “A closer look at memorization in deep networks,” in International conference on machine learning.   PMLR, 2017, pp. 233–242.
  76. S. Ye, D. Chen, S. Han, and J. Liao, “Learning with noisy labels for robust point cloud segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6443–6452.
  77. B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” Advances in neural information processing systems, vol. 34, pp. 17 864–17 875, 2021.
  78. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Cedric Perauer (1 paper)
  2. Laurenz Adrian Heidrich (2 papers)
  3. Haifan Zhang (1 paper)
  4. Matthias Nießner (177 papers)
  5. Anastasiia Kornilova (15 papers)
  6. Alexey Artemov (29 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com