VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model (2401.02695v2)
Abstract: In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers agents to adeptly traverse unfamiliar environments and locate objects from novel categories without prior explicit training. This paper introduces VoroNav, a novel semantic exploration framework that proposes the Reduced Voronoi Graph to extract exploratory paths and planning nodes from a semantic map constructed in real time. By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a LLM. In particular, our approach presents a synergy of path and farsight descriptions to represent the environmental context, enabling LLM to apply commonsense reasoning to ascertain waypoints for navigation. Extensive evaluation on HM3D and HSSD validates VoroNav surpasses existing benchmarks in both success rate and exploration efficiency (absolute improvement: +2.8% Success and +3.7% SPL on HM3D, +2.6% Success and +3.8% SPL on HSSD). Additionally introduced metrics that evaluate obstacle avoidance proficiency and perceptual efficiency further corroborate the enhancements achieved by our method in ZSON planning. Project page: https://voro-nav.github.io
- Zero experience required: Plug & play modular transfer learning for semantic visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17031–17041, 2022.
- Etpnav: Evolving topological planning for vision-language navigation in continuous environments. arXiv preprint arXiv:2304.03047, 2023.
- On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018.
- Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill. arXiv preprint arXiv:2309.10309, 2023.
- Zero-shot object searching using large-scale object relationship prior, 2023a.
- How to not train your dragon: Training-free embodied object goal navigation with semantic frontiers, 2023b.
- Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23171–23181, 2023.
- Navigating to objects in the real world. Science Robotics, 8(79):eadf6991, 2023.
- Robot learning in homes: Improving generalization and reducing dataset bias. Advances in neural information processing systems, 31, 2018.
- 3d-llm: Injecting the 3d world into large language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Incremental reconstruction of generalized voronoi diagrams on grids. Robotics and Autonomous Systems, 57(2):123–128, 2009.
- Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation, 2023.
- Segment anything. arXiv:2304.02643, 2023.
- Beyond the nav-graph: Vision-and-language navigation in continuous environments. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, pages 104–120. Springer, 2020.
- Waypoint models for instruction-guided navigation in continuous environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15162–15171, 2021.
- Renderable neural radiance map for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9099–9108, 2023.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
- Improving autonomous exploration using reduced approximated generalized voronoi graphs. Journal of Intelligent & Robotic Systems, 99:91–113, 2020.
- Revolt: Relational reasoning and voronoi local graph planning for target-driven navigation. arXiv preprint arXiv:2301.02382, 2023a.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023b.
- ZSON: Zero-shot object-goal navigation using multimodal goal embeddings. In Advances in Neural Information Processing Systems, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback, 2022.
- Zero-shot active visual search (zavis): Intelligent object search for robotic assistants. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2004–2010, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. arXiv preprint arXiv:2109.08238, 2021.
- Poni: Potential functions for objectgoal navigation with interaction-free learning. In Computer Vision and Pattern Recognition (CVPR), 2022 IEEE Conference on. IEEE, 2022.
- James A Sethian. A fast marching level set method for monotonically advancing fronts. proceedings of the National Academy of Sciences, 93(4):1591–1595, 1996.
- Navigation with large language models: Semantic guesswork as a heuristic for planning. In 7th Annual Conference on Robot Learning, 2023.
- scikit-image: image processing in python. PeerJ, 2:e453, 2014.
- Habitat challenge 2022. https://aihabitat.org/challenge/2022/, 2022.
- Brian Yamauchi. A frontier-based approach for autonomous exploration. In Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’, pages 146–151. IEEE, 1997.
- Co-navgpt: Multi-robot cooperative visual semantic navigation using large language models, 2023a.
- L3mvn: Leveraging large language models for visual target navigation. arXiv preprint arXiv:2304.05501, 2023b.
- Zero-shot object goal visual navigation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2025–2031, 2023.
- Esc: Exploration with soft commonsense constraints for zero-shot object navigation. arXiv preprint arXiv:2301.13166, 2023.