Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable SoftGroup for 3D Instance Segmentation on Point Clouds (2209.08263v3)

Published 17 Sep 2022 in cs.CV

Abstract: This paper considers a network referred to as SoftGroup for accurate and scalable 3D instance segmentation. Existing state-of-the-art methods produce hard semantic predictions followed by grouping instance segmentation results. Unfortunately, errors stemming from hard decisions propagate into the grouping, resulting in poor overlap between predicted instances and ground truth and substantial false positives. To address the abovementioned problems, SoftGroup allows each point to be associated with multiple classes to mitigate the uncertainty stemming from semantic prediction. It also suppresses false positive instances by learning to categorize them as background. Regarding scalability, the existing fast methods require computational time on the order of tens of seconds on large-scale scenes, which is unsatisfactory and far from applicable for real-time. Our finding is that the $k$-Nearest Neighbor ($k$-NN) module, which serves as the prerequisite of grouping, introduces a computational bottleneck. SoftGroup is extended to resolve this computational bottleneck, referred to as SoftGroup++. The proposed SoftGroup++ reduces time complexity with octree $k$-NN and reduces search space with class-aware pyramid scaling and late devoxelization. Experimental results on various indoor and outdoor datasets demonstrate the efficacy and generality of the proposed SoftGroup and SoftGroup++. Their performances surpass the best-performing baseline by a large margin (6\% $\sim$ 16\%) in terms of AP$_{50}$. On datasets with large-scale scenes, SoftGroup++ achieves a 6$\times$ speed boost on average compared to SoftGroup. Furthermore, SoftGroup can be extended to perform object detection and panoptic segmentation with nontrivial improvements over existing methods. The source code and trained models are available at \url{https://github.com/thangvubk/SoftGroup}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. L. Jiang, H. Zhao, S. Shi, S. Liu, C.-W. Fu, and J. Jia, “Pointgroup: Dual-set point grouping for 3d instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  2. Z. Liang, Z. Li, S. Xu, M. Tan, and K. Jia, “Instance segmentation in 3d scenes using semantic superpoint tree networks,” in IEEE International Conference on Computer Vision (ICCV), 2021.
  3. S. Chen, J. Fang, Q. Zhang, W. Liu, and X. Wang, “Hierarchical aggregation for 3d instance segmentation,” in IEEE International Conference on Computer Vision (ICCV), 2021.
  4. B. Yang, J. Wang, R. Clark, Q. Hu, S. Wang, A. Markham, and N. Trigoni, “Learning object bounding boxes for 3d instance segmentation on point clouds,” in Neural Information Processing Systems (NeurIPS), 2019.
  5. T. Vu, K. Kim, T. M. Luu, X. T. Nguyen, and C. D. Yoo, “Softgroup for 3d instance segmentation on point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  6. M. Aubry, U. Schlickewei, and D. Cremers, “The wave kernel signature: A quantum mechanical approach to shape analysis,” in IEEE International Conference on Computer Vision (ICCV) workshops, 2011.
  7. R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in IEEE International Conference on Robotics and Automation (ICRA), 2009.
  8. R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz, “Aligning point cloud views using persistent feature histograms,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008.
  9. M. M. Bronstein and I. Kokkinos, “Scale-invariant heat kernel signatures for non-rigid shape recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
  10. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  11. C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017.
  12. Y. Xu, T. Fan, M. Xu, L. Zeng, and Y. Qiao, “Spidercnn: Deep learning on point sets with parameterized convolutional filters,” in European Conference on Computer Vision (ECCV), 2018.
  13. Y. Liu, B. Fan, S. Xiang, and C. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  14. W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  15. H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in IEEE International Conference on Computer Vision (ICCV), 2019.
  16. B.-S. Hua, M.-K. Tran, and S.-K. Yeung, “Pointwise convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  17. Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x-transformed points,” in Neural Information Processing Systems (NeurIPS), 2018.
  18. B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  19. C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  20. D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015.
  21. G. Riegler, A. Osman Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  22. J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” in International Conference on Machine Learning (ICML), 2019.
  23. H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in IEEE International Conference on Computer Vision (ICCV), 2021.
  24. C. Park, Y. Jeong, M. Cho, and J. Park, “Fast point transformer,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  25. M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  26. Y. Shen, C. Feng, Y. Yang, and D. Tian, “Mining point cloud local structures by kernel correlation and graph pooling,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  27. Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Transactions on Graphics (TOG), 2019.
  28. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in IEEE International Conference on Computer Vision (ICCV), 2017.
  29. L. Yi, W. Zhao, H. Wang, M. Sung, and L. J. Guibas, “Gspn: Generative shape proposal network for 3d instance segmentation in point cloud,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  30. J. Hou, A. Dai, and M. Nießner, “3d-sis: 3d semantic instance segmentation of rgb-d scans,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  31. S.-H. Liu, S.-Y. Yu, S.-C. Wu, H.-T. Chen, and T.-L. Liu, “Learning gaussian instance segmentation in point clouds,” arXiv:2007.09860, 2020.
  32. W. Wang, R. Yu, Q. Huang, and U. Neumann, “Sgpn: Similarity group proposal network for 3d point cloud instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  33. Q.-H. Pham, T. Nguyen, B.-S. Hua, G. Roig, and S.-K. Yeung, “Jsis3d: joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  34. J. Lahoud, B. Ghanem, M. Pollefeys, and M. R. Oswald, “3d instance segmentation via multi-task metric learning,” in IEEE International Conference on Computer Vision (ICCV), 2019.
  35. L. Han, T. Zheng, L. Xu, and L. Fang, “Occuseg: Occupancy-aware 3d instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  36. B. Zhang and P. Wonka, “Point cloud instance segmentation using probabilistic embeddings,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  37. Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  38. S. Gasperini, M.-A. N. Mahani, A. Marcos-Ramiro, N. Navab, and F. Tombari, “Panoster: End-to-end panoptic segmentation of lidar point clouds,” IEEE Robotics and Automation Letters, 2021.
  39. F. Hong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “Lidar-based panoptic segmentation via dynamic shifting network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  40. J. Li, X. He, Y. Wen, Y. Gao, X. Cheng, and D. Zhang, “Panoptic-phnet: Towards real-time and high-precision lidar panoptic segmentation via clustering pseudo heatmap,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  41. C. Fu, G. Li, R. Song, W. Gao, and S. Liu, “Octattention: Octree-based large-scale contexts model for point cloud compression,” in AAAI Conference on Artificial Intelligence, 2022.
  42. Q. Xu, X. Sun, C.-Y. Wu, P. Wang, and U. Neumann, “Grid-gcn for fast and scalable point cloud learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  43. R. A. Rosu, P. Schütt, J. Quenzel, and S. Behnke, “Latticenet: Fast point cloud segmentation using permutohedral lattices,” in Proc. of Robotics: Science and Systems (RSS), 2020.
  44. S. Lombardi, M. R. Oswald, and M. Pollefeys, “Scalable point cloud-based reconstruction with local implicit functions,” in International Conference on 3D Vision (3DV), 2020.
  45. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention, 2015.
  46. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Neural Information Processing Systems (NeurIPS), 2015.
  47. Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring r-cnn,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  48. A. Miller, V. Jain, and J. L. Mundy, “Real-time rendering and dynamic updating of 3-d volumetric data,” in Workshop on General Purpose Processing on Graphics Processing Units, 2011.
  49. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  50. A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  51. M. Chen, Q. Hu, T. Hugues, A. Feng, Y. Hou, K. McCullough, and L. Soibelman, “Stpls3d: A large-scale synthetic and real aerial photogrammetry 3d point cloud dataset,” arXiv:2203.09065, 2022.
  52. J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in IEEE International Conference on Computer Vision (ICCV), 2019, pp. 9297–9307.
  53. A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  54. L. Porzi, S. R. Bulo, A. Colovic, and P. Kontschieder, “Seamless scene segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  55. C. Liu and Y. Furukawa, “Masc: Multi-scale affinity with sparse convolution for 3d instance segmentation,” arXiv:1902.04478, 2019.
  56. G. Narita, T. Seno, T. Ishikawa, and Y. Kaji, “Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,” arXiv:1903.01177, 2019.
  57. F. Engelmann, M. Bokeloh, A. Fathi, B. Leibe, and M. Nießner, “3d-mpa: Multi-proposal aggregation for 3d semantic instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  58. T. He, C. Shen, and A. van den Hengel, “Dyco3d: Robust instance segmentation of 3d point clouds through dynamic convolution,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  59. X. Wang, S. Liu, X. Shen, C. Shen, and J. Jia, “Associatively segmenting instances and semantics in point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  60. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in Neural Information Processing Systems (NeurIPS)-W, 2017.
  61. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
  62. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in International Conference on Learning Representations (ICLR), 2017.
  63. C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets for 3d object detection from rgb-d data,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  64. C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  65. A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and accurate lidar semantic segmentation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
  66. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  67. S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  68. A. Milioto, J. Behley, C. McCool, and C. Stachniss, “Lidar panoptic segmentation for autonomous driving,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
  69. M. Aygun, A. Osep, M. Weber, M. Maximov, C. Stachniss, J. Behley, and L. Leal-Taixé, “4d panoptic lidar segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Citations (22)

Summary

  • The paper introduces a novel soft grouping mechanism that uses soft semantic scores to reduce error propagation in 3D instance segmentation.
  • The paper presents SoftGroup++, which employs octree k-NN and pyramid scaling to lower computational complexity and achieve a 6× speed boost on large scenes.
  • The paper validates its approach with improved accuracy and versatility across datasets like ScanNet v2 and SemanticKITTI, highlighting broad applicability in 3D vision tasks.

Scalable SoftGroup for 3D Instance Segmentation on Point Clouds

The paper "Scalable SoftGroup for 3D Instance Segmentation on Point Clouds" introduces a novel approach for effective and scalable instance segmentation in the context of 3D point clouds. The method, known as SoftGroup, addresses limitations found in existing state-of-the-art instance segmentation strategies by utilizing a soft grouping mechanism to enhance accuracy and a sophisticated architecture, SoftGroup++, to improve scalability on large-scale scenes.

Key Contributions

The work presents several noteworthy contributions:

  1. SoftGroup Architecture: The authors tackle the problem of error propagation due to hard semantic predictions by introducing SoftGroup. The network associates each point with multiple classes using soft semantic scores, thereby mitigating uncertainty and reducing false positives by categorizing them as background. This soft association leads to more accurate segmentation results by preventing errors that commonly arise in hard-decision pipelines.
  2. Scalability – SoftGroup++: Addressing the computational bottlenecks typical of large-scale point cloud data, the researchers introduce SoftGroup++. This version integrates octree kk-NN to reduce time complexity from O(n2)\mathcal{O}(n^2) to O(nlogn)\mathcal{O}(n\log n). Furthermore, it employs class-aware pyramid scaling and late devoxelization to diminish search spaces during processing, leading to a significant speed boost.
  3. Performance and Versatility: Experimental results across multiple datasets illustrate that SoftGroup and SoftGroup++ outperform their predecessors. They show substantial improvements in AP50_{50} scores, with SoftGroup++ attaining a 6×\times speed gain over SoftGroup on large-scale scenes. Remarkably, these methods can extend to object detection and panoptic segmentation, increasing their utility across different 3D vision tasks.

Experimental Insights

The experimental evaluation demonstrates the strengths of these methods:

  • The SoftGroup architecture achieves substantial gains in accuracy, with improvements ranging from 6\% to 16\% over leading methods on datasets such as ScanNet v2.
  • SoftGroup++ exhibits considerable improvements in processing large-scale datasets, evidenced by the significant reductions in inference time without sacrificing accuracy.
  • The proposed methods demonstrate flexibility and superior performance in different 3D instance segmentation and detection contexts, as evidenced by their success across varied datasets like S3DIS, STPLS3D, and SemanticKITTI.

Theoretical and Practical Implications

The paper sheds light on the interplay between uncertainty in semantic predictions and instance grouping, advocating for a soft decision lens in machine learning pipelines for 3D point cloud data. Practically, the findings allow for near-real-time processing of large-point-cloud scenes, paving the way for more efficient applications in autonomous driving, robotics, and AR/VR environments where time-critical performance is essential.

Future Directions

Future research could explore optimizing the hyperparameters intrinsic to SoftGroup++ further, potentially incorporating more adaptive mechanisms for real-time applications. Additionally, extending the principles underlying SoftGroup to other forms of hierarchical and non-hierarchical data could prove beneficial, broadening the applicability of these insights within the broader artificial intelligence research community.

In conclusion, this paper delivers a robust framework enhancing both the scalability and accuracy in 3D instance segmentation, effectively bridging theoretical advancements with practical applications in AI-driven 3D perception tasks.

Github Logo Streamline Icon: https://streamlinehq.com