Scalable SoftGroup for 3D Instance Segmentation on Point Clouds (2209.08263v3)
Abstract: This paper considers a network referred to as SoftGroup for accurate and scalable 3D instance segmentation. Existing state-of-the-art methods produce hard semantic predictions followed by grouping instance segmentation results. Unfortunately, errors stemming from hard decisions propagate into the grouping, resulting in poor overlap between predicted instances and ground truth and substantial false positives. To address the abovementioned problems, SoftGroup allows each point to be associated with multiple classes to mitigate the uncertainty stemming from semantic prediction. It also suppresses false positive instances by learning to categorize them as background. Regarding scalability, the existing fast methods require computational time on the order of tens of seconds on large-scale scenes, which is unsatisfactory and far from applicable for real-time. Our finding is that the $k$-Nearest Neighbor ($k$-NN) module, which serves as the prerequisite of grouping, introduces a computational bottleneck. SoftGroup is extended to resolve this computational bottleneck, referred to as SoftGroup++. The proposed SoftGroup++ reduces time complexity with octree $k$-NN and reduces search space with class-aware pyramid scaling and late devoxelization. Experimental results on various indoor and outdoor datasets demonstrate the efficacy and generality of the proposed SoftGroup and SoftGroup++. Their performances surpass the best-performing baseline by a large margin (6\% $\sim$ 16\%) in terms of AP$_{50}$. On datasets with large-scale scenes, SoftGroup++ achieves a 6$\times$ speed boost on average compared to SoftGroup. Furthermore, SoftGroup can be extended to perform object detection and panoptic segmentation with nontrivial improvements over existing methods. The source code and trained models are available at \url{https://github.com/thangvubk/SoftGroup}.
- L. Jiang, H. Zhao, S. Shi, S. Liu, C.-W. Fu, and J. Jia, “Pointgroup: Dual-set point grouping for 3d instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Z. Liang, Z. Li, S. Xu, M. Tan, and K. Jia, “Instance segmentation in 3d scenes using semantic superpoint tree networks,” in IEEE International Conference on Computer Vision (ICCV), 2021.
- S. Chen, J. Fang, Q. Zhang, W. Liu, and X. Wang, “Hierarchical aggregation for 3d instance segmentation,” in IEEE International Conference on Computer Vision (ICCV), 2021.
- B. Yang, J. Wang, R. Clark, Q. Hu, S. Wang, A. Markham, and N. Trigoni, “Learning object bounding boxes for 3d instance segmentation on point clouds,” in Neural Information Processing Systems (NeurIPS), 2019.
- T. Vu, K. Kim, T. M. Luu, X. T. Nguyen, and C. D. Yoo, “Softgroup for 3d instance segmentation on point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- M. Aubry, U. Schlickewei, and D. Cremers, “The wave kernel signature: A quantum mechanical approach to shape analysis,” in IEEE International Conference on Computer Vision (ICCV) workshops, 2011.
- R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in IEEE International Conference on Robotics and Automation (ICRA), 2009.
- R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz, “Aligning point cloud views using persistent feature histograms,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008.
- M. M. Bronstein and I. Kokkinos, “Scale-invariant heat kernel signatures for non-rigid shape recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017.
- Y. Xu, T. Fan, M. Xu, L. Zeng, and Y. Qiao, “Spidercnn: Deep learning on point sets with parameterized convolutional filters,” in European Conference on Computer Vision (ECCV), 2018.
- Y. Liu, B. Fan, S. Xiang, and C. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in IEEE International Conference on Computer Vision (ICCV), 2019.
- B.-S. Hua, M.-K. Tran, and S.-K. Yeung, “Pointwise convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Convolution on x-transformed points,” in Neural Information Processing Systems (NeurIPS), 2018.
- B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmentation with submanifold sparse convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- C. Choy, J. Gwak, and S. Savarese, “4d spatio-temporal convnets: Minkowski convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015.
- G. Riegler, A. Osman Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” in International Conference on Machine Learning (ICML), 2019.
- H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in IEEE International Conference on Computer Vision (ICCV), 2021.
- C. Park, Y. Jeong, M. Cho, and J. Park, “Fast point transformer,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Y. Shen, C. Feng, Y. Yang, and D. Tian, “Mining point cloud local structures by kernel correlation and graph pooling,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Transactions on Graphics (TOG), 2019.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in IEEE International Conference on Computer Vision (ICCV), 2017.
- L. Yi, W. Zhao, H. Wang, M. Sung, and L. J. Guibas, “Gspn: Generative shape proposal network for 3d instance segmentation in point cloud,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- J. Hou, A. Dai, and M. Nießner, “3d-sis: 3d semantic instance segmentation of rgb-d scans,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- S.-H. Liu, S.-Y. Yu, S.-C. Wu, H.-T. Chen, and T.-L. Liu, “Learning gaussian instance segmentation in point clouds,” arXiv:2007.09860, 2020.
- W. Wang, R. Yu, Q. Huang, and U. Neumann, “Sgpn: Similarity group proposal network for 3d point cloud instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Q.-H. Pham, T. Nguyen, B.-S. Hua, G. Roig, and S.-K. Yeung, “Jsis3d: joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- J. Lahoud, B. Ghanem, M. Pollefeys, and M. R. Oswald, “3d instance segmentation via multi-task metric learning,” in IEEE International Conference on Computer Vision (ICCV), 2019.
- L. Han, T. Zheng, L. Xu, and L. Fang, “Occuseg: Occupancy-aware 3d instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- B. Zhang and P. Wonka, “Point cloud instance segmentation using probabilistic embeddings,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- S. Gasperini, M.-A. N. Mahani, A. Marcos-Ramiro, N. Navab, and F. Tombari, “Panoster: End-to-end panoptic segmentation of lidar point clouds,” IEEE Robotics and Automation Letters, 2021.
- F. Hong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “Lidar-based panoptic segmentation via dynamic shifting network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- J. Li, X. He, Y. Wen, Y. Gao, X. Cheng, and D. Zhang, “Panoptic-phnet: Towards real-time and high-precision lidar panoptic segmentation via clustering pseudo heatmap,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- C. Fu, G. Li, R. Song, W. Gao, and S. Liu, “Octattention: Octree-based large-scale contexts model for point cloud compression,” in AAAI Conference on Artificial Intelligence, 2022.
- Q. Xu, X. Sun, C.-Y. Wu, P. Wang, and U. Neumann, “Grid-gcn for fast and scalable point cloud learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- R. A. Rosu, P. Schütt, J. Quenzel, and S. Behnke, “Latticenet: Fast point cloud segmentation using permutohedral lattices,” in Proc. of Robotics: Science and Systems (RSS), 2020.
- S. Lombardi, M. R. Oswald, and M. Pollefeys, “Scalable point cloud-based reconstruction with local implicit functions,” in International Conference on 3D Vision (3DV), 2020.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention, 2015.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Neural Information Processing Systems (NeurIPS), 2015.
- Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring r-cnn,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- A. Miller, V. Jain, and J. L. Mundy, “Real-time rendering and dynamic updating of 3-d volumetric data,” in Workshop on General Purpose Processing on Graphics Processing Units, 2011.
- I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, “3d semantic parsing of large-scale indoor spaces,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- M. Chen, Q. Hu, T. Hugues, A. Feng, Y. Hou, K. McCullough, and L. Soibelman, “Stpls3d: A large-scale synthetic and real aerial photogrammetry 3d point cloud dataset,” arXiv:2203.09065, 2022.
- J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” in IEEE International Conference on Computer Vision (ICCV), 2019, pp. 9297–9307.
- A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- L. Porzi, S. R. Bulo, A. Colovic, and P. Kontschieder, “Seamless scene segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- C. Liu and Y. Furukawa, “Masc: Multi-scale affinity with sparse convolution for 3d instance segmentation,” arXiv:1902.04478, 2019.
- G. Narita, T. Seno, T. Ishikawa, and Y. Kaji, “Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,” arXiv:1903.01177, 2019.
- F. Engelmann, M. Bokeloh, A. Fathi, B. Leibe, and M. Nießner, “3d-mpa: Multi-proposal aggregation for 3d semantic instance segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- T. He, C. Shen, and A. van den Hengel, “Dyco3d: Robust instance segmentation of 3d point clouds through dynamic convolution,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- X. Wang, S. Liu, X. Shen, C. Shen, and J. Jia, “Associatively segmenting instances and semantics in point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in Neural Information Processing Systems (NeurIPS)-W, 2017.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
- I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in International Conference on Learning Representations (ICLR), 2017.
- C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets for 3d object detection from rgb-d data,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and accurate lidar semantic segmentation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-rcnn: Point-voxel feature set abstraction for 3d object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- A. Milioto, J. Behley, C. McCool, and C. Stachniss, “Lidar panoptic segmentation for autonomous driving,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
- M. Aygun, A. Osep, M. Weber, M. Maximov, C. Stachniss, J. Behley, and L. Leal-Taixé, “4d panoptic lidar segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.