TriHelper: Zero-Shot Object Navigation with Dynamic Assistance (2403.15223v1)
Abstract: Navigating toward specific objects in unknown environments without additional training, known as Zero-Shot object navigation, poses a significant challenge in the field of robotics, which demands high levels of auxiliary information and strategic planning. Traditional works have focused on holistic solutions, overlooking the specific challenges agents encounter during navigation such as collision, low exploration efficiency, and misidentification of targets. To address these challenges, our work proposes TriHelper, a novel framework designed to assist agents dynamically through three primary navigation challenges: collision, exploration, and detection. Specifically, our framework consists of three innovative components: (i) Collision Helper, (ii) Exploration Helper, and (iii) Detection Helper. These components work collaboratively to solve these challenges throughout the navigation process. Experiments on the Habitat-Matterport 3D (HM3D) and Gibson datasets demonstrate that TriHelper significantly outperforms all existing baseline methods in Zero-Shot object navigation, showcasing superior success rates and exploration efficiency. Our ablation studies further underscore the effectiveness of each helper in addressing their respective challenges, notably enhancing the agent's navigation capabilities. By proposing TriHelper, we offer a fresh perspective on advancing the object navigation task, paving the way for future research in the domain of Embodied AI and visual-based navigation.
- “Habitat: A platform for embodied ai research” In Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9339–9347
- “Improving zero-shot generalization in offline reinforcement learning using generalized similarity functions” In Advances in Neural Information Processing Systems 35, 2022, pp. 25088–25101
- “Psiphi-learning: Reinforcement learning with demonstrations using successor features and inverse temporal difference learning” In International Conference on Machine Learning, 2021, pp. 3305–3317 PMLR
- “ProcTHOR: Large-Scale Embodied AI Using Procedural Generation” In Advances in Neural Information Processing Systems 35, 2022, pp. 5982–5994
- “Towards more generalizable one-shot visual imitation learning” In 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2434–2444 IEEE
- “PIRLNav: Pretraining With Imitation and RL Finetuning for ObjectNav” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17896–17906
- Bangguo Yu, Hamidreza Kasaei and Ming Cao “L3mvn: Leveraging large language models for visual target navigation” In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 3554–3560 IEEE
- “Vlfm: Vision-language frontier maps for zero-shot semantic navigation” In 2nd Workshop on Language and Robot Learning: Language as Grounding, 2023
- Brian Yamauchi “A frontier-based approach for autonomous exploration” In Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’, 1997, pp. 146–151 IEEE
- Vernon Kok, Micheal Olusanya and Absalom Ezugwu “A few-shot learning-based reward estimation for mapless navigation of mobile robots using a siamese convolutional neural network” In Applied Sciences 12.11 MDPI, 2022, pp. 5323
- “Objectnav revisited: On evaluation of embodied agents navigating to objects” In arXiv preprint arXiv:2006.13171, 2020
- “Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill” In arXiv preprint arXiv:2309.10309, 2023
- “Object goal navigation using goal-oriented semantic exploration” In Advances in Neural Information Processing Systems 33, 2020, pp. 4247–4258
- “Hierarchical object-to-zone graph for object navigation” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15130–15140
- “Stubborn: A strong baseline for indoor object navigation” In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 3287–3293 IEEE
- “Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23171–23181
- “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 PMLR
- Vishnu Sashank Dorbala, James F Mullen Jr and Dinesh Manocha “Can an Embodied Agent Find Your “Cat-shaped Mug”? LLM-Based Zero-Shot Object Navigation” In IEEE Robotics and Automation Letters IEEE, 2023
- “Navigation with large language models: Semantic guesswork as a heuristic for planning” In Conference on Robot Learning, 2023, pp. 2683–2699 PMLR
- “Esc: Exploration with soft commonsense constraints for zero-shot object navigation” In International Conference on Machine Learning, 2023, pp. 42829–42842 PMLR
- “Poni: Potential functions for objectgoal navigation with interaction-free learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18890–18900
- Miguel Juliá, Arturo Gil and Oscar Reinoso “A comparison of path planning strategies for autonomous exploration and mapping of unknown environments” In Autonomous Robots 33 Springer, 2012, pp. 427–444
- James A Sethian “A fast marching level set method for monotonically advancing fronts.” In proceedings of the National Academy of Sciences 93.4 National Acad Sciences, 1996, pp. 1591–1595
- “Roberta: A robustly optimized bert pretraining approach” In arXiv preprint arXiv:1907.11692, 2019
- “Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation” In arXiv preprint arXiv:1806.01054, 2018
- Glenn Jocher, Ayush Chaurasia and Jing Qiu “Ultralytics YOLOv8”, 2023 URL: https://github.com/ultralytics/ultralytics
- “Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond”, 2023
- “On evaluation of embodied navigation agents” In arXiv preprint arXiv:1807.06757, 2018
- “Zson: Zero-shot object-goal navigation using multimodal goal embeddings” In Advances in Neural Information Processing Systems 35, 2022, pp. 32340–32352
- “Grounded language-image pre-training” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10965–10975
- Lingfeng Zhang (24 papers)
- Qiang Zhang (466 papers)
- Hao Wang (1120 papers)
- Erjia Xiao (13 papers)
- Zixuan Jiang (16 papers)
- Honglei Chen (8 papers)
- Renjing Xu (72 papers)