Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TriHelper: Zero-Shot Object Navigation with Dynamic Assistance (2403.15223v1)

Published 22 Mar 2024 in cs.RO

Abstract: Navigating toward specific objects in unknown environments without additional training, known as Zero-Shot object navigation, poses a significant challenge in the field of robotics, which demands high levels of auxiliary information and strategic planning. Traditional works have focused on holistic solutions, overlooking the specific challenges agents encounter during navigation such as collision, low exploration efficiency, and misidentification of targets. To address these challenges, our work proposes TriHelper, a novel framework designed to assist agents dynamically through three primary navigation challenges: collision, exploration, and detection. Specifically, our framework consists of three innovative components: (i) Collision Helper, (ii) Exploration Helper, and (iii) Detection Helper. These components work collaboratively to solve these challenges throughout the navigation process. Experiments on the Habitat-Matterport 3D (HM3D) and Gibson datasets demonstrate that TriHelper significantly outperforms all existing baseline methods in Zero-Shot object navigation, showcasing superior success rates and exploration efficiency. Our ablation studies further underscore the effectiveness of each helper in addressing their respective challenges, notably enhancing the agent's navigation capabilities. By proposing TriHelper, we offer a fresh perspective on advancing the object navigation task, paving the way for future research in the domain of Embodied AI and visual-based navigation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Habitat: A platform for embodied ai research” In Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9339–9347
  2. “Improving zero-shot generalization in offline reinforcement learning using generalized similarity functions” In Advances in Neural Information Processing Systems 35, 2022, pp. 25088–25101
  3. “Psiphi-learning: Reinforcement learning with demonstrations using successor features and inverse temporal difference learning” In International Conference on Machine Learning, 2021, pp. 3305–3317 PMLR
  4. “ProcTHOR: Large-Scale Embodied AI Using Procedural Generation” In Advances in Neural Information Processing Systems 35, 2022, pp. 5982–5994
  5. “Towards more generalizable one-shot visual imitation learning” In 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2434–2444 IEEE
  6. “PIRLNav: Pretraining With Imitation and RL Finetuning for ObjectNav” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17896–17906
  7. Bangguo Yu, Hamidreza Kasaei and Ming Cao “L3mvn: Leveraging large language models for visual target navigation” In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 3554–3560 IEEE
  8. “Vlfm: Vision-language frontier maps for zero-shot semantic navigation” In 2nd Workshop on Language and Robot Learning: Language as Grounding, 2023
  9. Brian Yamauchi “A frontier-based approach for autonomous exploration” In Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’, 1997, pp. 146–151 IEEE
  10. Vernon Kok, Micheal Olusanya and Absalom Ezugwu “A few-shot learning-based reward estimation for mapless navigation of mobile robots using a siamese convolutional neural network” In Applied Sciences 12.11 MDPI, 2022, pp. 5323
  11. “Objectnav revisited: On evaluation of embodied agents navigating to objects” In arXiv preprint arXiv:2006.13171, 2020
  12. “Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill” In arXiv preprint arXiv:2309.10309, 2023
  13. “Object goal navigation using goal-oriented semantic exploration” In Advances in Neural Information Processing Systems 33, 2020, pp. 4247–4258
  14. “Hierarchical object-to-zone graph for object navigation” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15130–15140
  15. “Stubborn: A strong baseline for indoor object navigation” In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 3287–3293 IEEE
  16. “Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23171–23181
  17. “Learning transferable visual models from natural language supervision” In International conference on machine learning, 2021, pp. 8748–8763 PMLR
  18. Vishnu Sashank Dorbala, James F Mullen Jr and Dinesh Manocha “Can an Embodied Agent Find Your “Cat-shaped Mug”? LLM-Based Zero-Shot Object Navigation” In IEEE Robotics and Automation Letters IEEE, 2023
  19. “Navigation with large language models: Semantic guesswork as a heuristic for planning” In Conference on Robot Learning, 2023, pp. 2683–2699 PMLR
  20. “Esc: Exploration with soft commonsense constraints for zero-shot object navigation” In International Conference on Machine Learning, 2023, pp. 42829–42842 PMLR
  21. “Poni: Potential functions for objectgoal navigation with interaction-free learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18890–18900
  22. Miguel Juliá, Arturo Gil and Oscar Reinoso “A comparison of path planning strategies for autonomous exploration and mapping of unknown environments” In Autonomous Robots 33 Springer, 2012, pp. 427–444
  23. James A Sethian “A fast marching level set method for monotonically advancing fronts.” In proceedings of the National Academy of Sciences 93.4 National Acad Sciences, 1996, pp. 1591–1595
  24. “Roberta: A robustly optimized bert pretraining approach” In arXiv preprint arXiv:1907.11692, 2019
  25. “Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation” In arXiv preprint arXiv:1806.01054, 2018
  26. Glenn Jocher, Ayush Chaurasia and Jing Qiu “Ultralytics YOLOv8”, 2023 URL: https://github.com/ultralytics/ultralytics
  27. “Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond”, 2023
  28. “On evaluation of embodied navigation agents” In arXiv preprint arXiv:1807.06757, 2018
  29. “Zson: Zero-shot object-goal navigation using multimodal goal embeddings” In Advances in Neural Information Processing Systems 35, 2022, pp. 32340–32352
  30. “Grounded language-image pre-training” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10965–10975
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lingfeng Zhang (24 papers)
  2. Qiang Zhang (466 papers)
  3. Hao Wang (1120 papers)
  4. Erjia Xiao (13 papers)
  5. Zixuan Jiang (16 papers)
  6. Honglei Chen (8 papers)
  7. Renjing Xu (72 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.