Semantic Layering in Room Segmentation via LLMs (2403.12920v1)
Abstract: In this paper, we introduce Semantic Layering in Room Segmentation via LLMs (SeLRoS), an advanced method for semantic room segmentation by integrating LLMs with traditional 2D map-based segmentation. Unlike previous approaches that solely focus on the geometric segmentation of indoor environments, our work enriches segmented maps with semantic data, including object identification and spatial relationships, to enhance robotic navigation. By leveraging LLMs, we provide a novel framework that interprets and organizes complex information about each segmented area, thereby improving the accuracy and contextual relevance of room segmentation. Furthermore, SeLRoS overcomes the limitations of existing algorithms by using a semantic evaluation method to accurately distinguish true room divisions from those erroneously generated by furniture and segmentation inaccuracies. The effectiveness of SeLRoS is verified through its application across 30 different 3D environments. Source code and experiment videos for this work are available at: https://sites.google.com/view/selros.
- P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. Sünderhauf, I. Reid, S. Gould, and A. Van Den Hengel, “Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3674–3683.
- F. Zhu, Y. Zhu, X. Chang, and X. Liang, “Vision-language navigation with self-supervised auxiliary reasoning tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 012–10 022.
- M. Murray and M. Cakmak, “Following natural language instructions for household tasks with landmark guided search and reinforced pose adjustment,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6870–6877, 2022.
- Y. Obinata, K. Kawaharazuka, N. Kanazawa, N. Yamaguchi, N. Tsukamoto, I. Yanokura, S. Kitagawa, K. Shinjo, K. Okada, and M. Inaba, “Semantic scene difference detection in daily life patroling by mobile robots using pre-trained large-scale vision-language model,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 3228–3233.
- R. Bormann, F. Jordan, W. Li, J. Hampp, and M. Hägele, “Room segmentation: Survey, implementation, and analysis,” in 2016 IEEE international conference on robotics and automation (ICRA). IEEE, 2016, pp. 1019–1026.
- M. Luperto, T. P. Kucner, A. Tassi, M. Magnusson, and F. Amigoni, “Robust structure identification and room segmentation of cluttered indoor environments from occupancy grid maps,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7974–7981, 2022.
- Z. He, J. Hou, and S. Schwertfeger, “Furniture free mapping using 3d lidars,” in 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2019, pp. 583–589.
- M. Mielle, M. Magnusson, and A. J. Lilienthal, “A method to segment maps from different modalities using free space layout maoris: Map of ripples segmentation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 4993–4999.
- R. Ambruş, S. Claici, and A. Wendt, “Automatic room segmentation from unstructured 3-d data of indoor environments,” IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 749–756, 2017.
- D. Bobkov, M. Kiechle, S. Hilsenbeck, and E. Steinbach, “Room segmentation in 3d point clouds using anisotropic potential fields,” in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 727–732.
- M. Afif, R. Ayachi, Y. Said, and M. Atri, “Deep learning based application for indoor scene recognition,” Neural Processing Letters, vol. 51, pp. 2827–2837, 2020.
- A. Basu, L. Petropoulakis, G. Di Caterina, and J. Soraghan, “Indoor home scene recognition using capsule neural networks,” Procedia Computer Science, vol. 167, pp. 440–448, 2020.
- S. Huang, M. Usvyatsov, and K. Schindler, “Indoor scene recognition in 3d,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 8041–8048.
- L. Zhou, J. Cen, X. Wang, Z. Sun, T. L. Lam, and Y. Xu, “Borm: Bayesian object relation model for indoor scene recognition,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 39–46.
- B. Miao, L. Zhou, A. S. Mian, T. L. Lam, and Y. Xu, “Object-to-scene: Learning to transfer object knowledge to indoor scene recognition,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 2069–2075.
- B. A. Labinghisa and D. M. Lee, “Indoor localization system using deep learning based scene recognition,” Multimedia Tools and Applications, vol. 81, no. 20, pp. 28 405–28 429, 2022.
- J. Wu, R. Antonova, A. Kan, M. Lepert, A. Zeng, S. Song, J. Bohg, S. Rusinkiewicz, and T. Funkhouser, “Tidybot: Personalized robot assistance with large language models,” arXiv preprint arXiv:2305.05658, 2023.
- B. Zhang and H. Soh, “Large language models as zero-shot human models for human-robot interaction,” arXiv preprint arXiv:2303.03548, 2023.
- Y. Ding, X. Zhang, C. Paxton, and S. Zhang, “Task and motion planning with large language models for object rearrangement,” arXiv preprint arXiv:2303.06247, 2023.
- S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt for robotics: Design principles and model abilities,” Microsoft Auton. Syst. Robot. Res, vol. 2, p. 20, 2023.
- I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “Progprompt: Generating situated robot task plans using large language models,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 11 523–11 530.
- Y.-J. Wang, B. Zhang, J. Chen, and K. Sreenath, “Prompt a robot to walk with large language models,” ArXiv, vol. abs/2309.09969, 2023.
- Y. Cao and C. G. Lee, “Ground manipulator primitive tasks to executable actions using large language models,” in Proceedings of the AAAI Symposium Series, vol. 2, no. 1, 2023, pp. 502–507.
- D. Shah, M. R. Equi, B. Osiński, F. Xia, B. Ichter, and S. Levine, “Navigation with large language models: Semantic guesswork as a heuristic for planning,” in Conference on Robot Learning. PMLR, 2023, pp. 2683–2699.
- T. Yu, T. Xiao, A. Stone, J. Tompson, A. Brohan, S. Wang, J. Singh, C. Tan, D. M, J. Peralta, B. Ichter, K. Hausman, and F. Xia, “Scaling robot learning with semantically imagined experience,” in arXiv preprint arXiv:2302.11550, 2023.
- B. Yu, H. Kasaei, and M. Cao, “L3mvn: Leveraging large language models for visual target navigation,” arXiv preprint arXiv:2304.05501, 2023.
- S. Friedman, H. Pasula, and D. Fox, “Voronoi random fields: Extracting topological structure of indoor environments via place labeling.” in IJCAI, vol. 7, 2007, pp. 2109–2114.
- X. Zhou, R. Girdhar, A. Joulin, P. Krähenbühl, and I. Misra, “Detecting twenty-thousand classes using image-level supervision,” in ECCV, 2022.
- G. Kim, T. Kim, S. S. Kannan, V. L. Venkatesh, D. Kim, and B.-C. Min, “Dynacon: Dynamic robot planner with contextual awareness via llms,” arXiv preprint arXiv:2309.16031, 2023.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou, et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
- M. Deitke, E. VanderBilt, A. Herrasti, L. Weihs, J. Salvador, K. Ehsani, W. Han, E. Kolve, A. Farhadi, A. Kembhavi, and R. Mottaghi, “ProcTHOR: Large-Scale Embodied AI Using Procedural Generation,” in NeurIPS, 2022, outstanding Paper Award.
- E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, D. Gordon, Y. Zhu, A. Gupta, and A. Farhadi, “AI2-THOR: An Interactive 3D Environment for Visual AI,” arXiv, 2017.
- P. Buschka and A. Saffiotti, “A virtual sensor for room detection,” in IEEE/RSJ international conference on intelligent robots and systems, vol. 1. IEEE, 2002, pp. 637–642.
- A. Diosi, G. Taylor, and L. Kleeman, “Interactive slam using laser and advanced sonar,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation. IEEE, 2005, pp. 1103–1108.
- S. Thrun, “Learning metric-topological maps for indoor mobile robot navigation,” Artificial Intelligence, vol. 99, no. 1, pp. 21–71, 1998.
- Ultralytics, “YOLOv8: Real-time object detection,” https://github.com/ultralytics/ultralytics, 2023, accessed: 2024-02-28.
- Roboflow, “MIT Indoor Scene Recognition,” https://universe.roboflow.com/popular-benchmarks/mit-indoor-scene-recognition/dataset/5, Oct 2022, accessed: 2024-02-28.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.