Volumetric Semantically Consistent 3D Panoptic Mapping (2309.14737v3)
Abstract: We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.
- M. Grinvald, F. Furrer, T. Novkovic, J. J. Chung, C. Cadena, R. Siegwart, and J. Nieto, “Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery,” IEEE Robotics and Automation Letters (RA-L), July 2019.
- X. Wang, S. Liu, X. Shen, C. Shen, and J. Jia, “Associatively Segmenting Instances and Semantics in Point Clouds,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- C. Elich, F. Engelmann, T. Kontogianni, and B. Leibe, “3D Bird’s-eye-view Instance Segmentation,” The German Conference on Pattern Recognition (GCPR), 2019.
- J. Lahoud, B. Ghanem, M. Pollefeys, and M. R. Oswald, “3D Instance Segmentation via Multi-Task Metric Learning,” International Conference on Computer Vision (ICCV), 2019.
- J. Hou, A. Dai, and M. Nießner, “3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- B. Yang, J. Wang, R. Clark, Q. Hu, S. Wang, A. Markham, and N. Trigoni, “Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds,” Neural Information Processing Systems (NeurIPS), 2019.
- S. Chen, J. Fang, Q. Zhang, W. Liu, and X. Wang, “Hierarchical Aggregation for 3D Instance Segmentation,” International Conference on Computer Vision (ICCV), 2021.
- F. Engelmann, M. Bokeloh, A. Fathi, B. Leibe, and M. Nießner, “3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- L. Han, T. Zheng, L. Xu, and L. Fang, “OccuSeg: Occupancy-aware 3D Instance Segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- L. Jiang, H. Zhao, S. Shi, S. Liu, C.-W. Fu, and J. Jia, “PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- T. Vu, K. Kim, T. M. Luu, X. T. Nguyen, and C. D. Yoo, “SoftGroup for 3D Instance Segmentation on Point Clouds,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- J. Schult, F. Engelmann, A. Hermans, O. Litany, S. Tang, and B. Leibe, “Mask3d for 3d semantic instance segmentation,” arXiv preprint arXiv:2210.03105, 2022.
- J. Sun, C. Qing, J. Tan, and X. Xu, “Superpoint transformer for 3d scene instance segmentation,” arXiv preprint arXiv:2211.15766, 2022.
- G. Narita, T. Seno, T. Ishikawa, and Y. Kaji, “Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
- A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollar, “Panoptic segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, “Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.
- L. Liu, T. Zheng, Y. Lin, K. Ni, and L. Fang, “Ins-conv: Incremental sparse convolution for online 3d segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- F. Furrer, T. Novkovic, M. Fehr, A. Gawel, M. Grinvald, T. Sattler, R. Siegwart, and J. Nieto, “Incremental object database: Building 3d models from multiple partial observations,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
- K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” International Conference on Computer Vision (ICCV), Oct 2017.
- R. Mascaro, L. Teixeira, and M. Chli, “Volumetric instance-level semantic mapping via multi-view 2d-to-3d label diffusion,” IEEE Robotics and Automation Letters (RA-L), 2022.
- M. Han, Z. Zhang, Z. Jiao, X. Xie, Y. Zhu, S.-C. Zhu, and H. Liu, “Reconstructing interactive 3d scenes by panoptic mapping and cad model alignments,” International Conference on Robotics and Automation, 2021.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, 2021.
- B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001.
- B.-S. Hua, Q.-H. Pham, D. T. Nguyen, M.-K. Tran, L.-F. Yu, and S.-K. Yeung, “Scenenn: A scene meshes dataset with annotations,” International Conference on 3D Vision, 2016.
- L. Wang, R. Li, J. Sun, X. Liu, L. Zhao, H. S. Seah, C. K. Quah, and B. Tandianus, “Multi-view fusion-based 3d object detection for robot indoor scene perception,” Sensors, 2019.
- W. Li, J. Gu, B. Chen, and J. Han, “Incremental instance-oriented 3d semantic mapping via rgb-d cameras for unknown indoor scene,” Discrete Dynamics in Nature and Society, 2020.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- W. Hong, Q. Guo, W. Zhang, J. Chen, and W. Chu, “Lpsnet: A lightweight solution for fast panoptic segmentation,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.