Mapping High-level Semantic Regions in Indoor Environments without Object Recognition (2403.07076v1)
Abstract: Robots require a semantic understanding of their surroundings to operate in an efficient and explainable way in human environments. In the literature, there has been an extensive focus on object labeling and exhaustive scene graph generation; less effort has been focused on the task of purely identifying and mapping large semantic regions. The present work proposes a method for semantic region mapping via embodied navigation in indoor environments, generating a high-level representation of the knowledge of the agent. To enable region identification, the method uses a vision-to-LLM to provide scene information for mapping. By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location. This mapping procedure is paired with a trained navigation policy to enable autonomous map generation. The proposed method significantly outperforms a variety of baselines, including an object-based system and a pretrained scene classifier, in experiments in a photorealistic simulator.
- “Object-based probabilistic place recognition for indoor human environments” In Proceedings of the IEEE International Conference on Control, Artificial Intelligence, Robotics & Optimization, 2018
- “Semantic labeling of indoor environments from 3d rgb maps” In Proceedings of the IEEE International Conference on Robotics and Automation, 2020
- “From object detection to room categorization in robotics” In Proceedings of the International Conference on Applications of Intelligent Systems, 2020
- “Building semantic grid maps for domestic robot navigation” In International Journal of Advanced Robotic Systems, 2020
- Peng Wang, Jun Cheng and Wei Feng “An Approach for Construct Semantic Map with Scene Classification and Object Semantic Segmentation” In Proceedings of the IEEE International Conference on Real-time Computing and Robotics, 2018
- “Places: A 10 million Image Database for Scene Recognition” In IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE, 2017
- “Learning Transferable Visual Models From Natural Language Supervision” In Proceedings of the International Conference on Machine Learning, 2021
- “Supervised Contrastive Learning” In Advances in Neural Information Processing Systems, 2020
- “Focus on Impact: Indoor Exploration with Intrinsic Motivation” In IEEE Robotics and Automation Letters IEEE, 2022
- Randall C Smith and Peter Cheeseman “On the Representation and Estimation of Spatial Uncertainty” In The International Journal of Robotics Research Sage Publications Sage CA: Thousand Oaks, CA, 1986
- Sebastian Thrun “Probabilistic Robotics” In Communications of the ACM ACM New York, NY, USA, 2002
- “Multiple View Geometry in Computer Vision” Cambridge University Press, 2003
- “Embodied Agents for Efficient Exploration and Smart Scene Description” In Proceedings of the IEEE International Conference on Robotics and Automation, 2023
- Nikolay Savinov, Alexey Dosovitskiy and Vladlen Koltun “Semi-parametric Topological Memory for Navigation” In Proceedings of the International Conference on Learning Representations, 2018
- “Episodic Curiosity through Reachability” In Proceedings of the International Conference on Learning Representations, 2019
- “Neural Map: Structured Memory for Deep Reinforcement Learning” In Proceedings of the International Conference on Learning Representations, 2018
- Joao F Henriques and Andrea Vedaldi “Mapnet: An allocentric spatial memory for mapping environments” In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8476–8484
- Santhosh K Ramakrishnan, Ziad Al-Halah and Kristen Grauman “Occupancy Anticipation for Efficient Exploration and Navigation” In Proceedings of the European Conference on Computer Vision, 2020
- “Spot the Difference: A Novel Task for Embodied Agents in Changing Environments” In Proceedings of the International Conference on Pattern Recognition, 2022
- “Visual Object Search by Learning Spatial Context” In IEEE Robotics and Automation Letters IEEE, 2020
- Zhen Zeng, Adrian Röfer and Odest Chadwicke Jenkins “Semantic Linking Maps for Active Visual Object Search” In Proceedings of the IEEE International Conference on Robotics and Automation, 2020
- Heming Du, Xin Yu and Liang Zheng “Learning Object Relation Graph and Tentative Policy for Visual Navigation” In Proceedings of the European Conference on Computer Vision, 2020
- “Graph R-CNN for Scene Graph Generation” In Proceedings of the European Conference on Computer Vision, 2018
- “Scene Graph Generation with External Knowledge and Image Reconstruction” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019
- “3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019
- “Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
- “SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
- “Embodied Semantic Scene Graph Generation” In Proceedings of the Conference on Robot Learning, 2022
- “Object Goal Navigation using Goal-Oriented Semantic Exploration” In Advances in Neural Information Processing Systems, 2020
- “Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views” In Proceedings of the Conference on Artificial Intelligence, 2020
- Jiahao Zhang, Zhiqiang Wang and Qing Zhu “Indoor Semantic Mapping with Efficient Convolutional Neural Networks for Resource-constrained SLAM System” In Journal of Physics: Conference Series, 2020
- “Crowdsourcing-Based Indoor Semantic Map Construction and Localization Using Graph Optimization” In Sensors, 2022
- “Efficient and Robust Semantic Mapping for Indoor Environments” In Proceedings of the IEEE International Conference on Robotics and Automation, 2022
- “Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs” In The International Journal of Robotics Research SAGE Publications Sage UK: London, England, 2021
- N. Hughes, Y. Chang and L. Carlone “Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization” In Robotics: Science and Systems, 2022
- “Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation” In Proceedings of the European Conference on Computer Vision, 2020
- “Place Categorization and Semantic Mapping on a Mobile Robot” In Proceedings of the IEEE International Conference on Robotics and Automation, 2016
- “Bayesian Spatial Kernel Smoothing for Scalable Dense Semantic Mapping” In IEEE Robotics and Automation Letters, 2020
- “CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory” In Workshop on Language and Robotics at CoRL, 2022
- “LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action” In Proceedings of the Conference on Robot Learning, 2022
- “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope” In International Journal of Computer Vision Springer, 2001
- “Recognizing Indoor Scenes” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009
- “Sun database: Large-scale scene recognition from abbey to zoo” In 2010 IEEE computer society conference on computer vision and pattern recognition, 2010, pp. 3485–3492 IEEE
- Michael McCloskey and Neal J Cohen “Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem” In Psychology of Learning and Motivation, 1989
- Anthony Robins “Catastrophic Forgetting, Rehearsal and Pseudorehearsal” In Connection Science, 1995
- “Habitat: A Platform for Embodied AI Research” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019
- “ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects” In arXiv preprint arXiv:2006.13171, 2020
- “Matterport3D: Learning from RGB-D Data in Indoor Environments” In Proceedings of the International Conference on 3D Vision, 2017
- “Deep Residual Learning for Image Recognition” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016
- Olaf Ronneberger, Philipp Fischer and Thomas Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation” In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, 2015
- “Embodied Navigation at the Art Gallery” In Proceedings of the International Conference on Image Analysis and Processing, 2022
- “Gibson Env: Real-World Perception for Embodied Agents” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018
- “Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI” In Advances in Neural Information Processing Systems, 2021
- Sungjoon Choi, Qian-Yi Zhou and Vladlen Koltun “Robust reconstruction of indoor scenes” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015
- Anh Nguyen, Jason Yosinski and Jeff Clune “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 427–436