Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reasoning about the Unseen for Efficient Outdoor Object Navigation (2309.10103v2)

Published 18 Sep 2023 in cs.RO and cs.AI

Abstract: Robots should exist anywhere humans do: indoors, outdoors, and even unmapped environments. In contrast, the focus of recent advancements in Object Goal Navigation(OGN) has targeted navigating in indoor environments by leveraging spatial and semantic cues that do not generalize outdoors. While these contributions provide valuable insights into indoor scenarios, the broader spectrum of real-world robotic applications often extends to outdoor settings. As we transition to the vast and complex terrains of outdoor environments, new challenges emerge. Unlike the structured layouts found indoors, outdoor environments lack clear spatial delineations and are riddled with inherent semantic ambiguities. Despite this, humans navigate with ease because we can reason about the unseen. We introduce a new task OUTDOOR, a new mechanism for LLMs to accurately hallucinate possible futures, and a new computationally aware success metric for pushing research forward in this more complex domain. Additionally, we show impressive results on both a simulated drone and physical quadruped in outdoor environments. Our agent has no premapping and our formalism outperforms naive LLM-based approaches

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. “On Evaluation of Embodied Navigation Agents”, 2018 arXiv:1807.06757 [cs.AI]
  2. “Object Goal Navigation using Goal-Oriented Semantic Exploration” In In Neural Information Processing Systems (NeurIPS), 2020
  3. “ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings”, 2022 arXiv:2206.12403 [cs.CV]
  4. Amal Nanavati, Xiang Zhi Tan and Aaron Steinfeld “Coupled Indoor Navigation for People Who Are Blind” In Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’18 Chicago, IL, USA: Association for Computing Machinery, 2018, pp. 201–202 DOI: 10.1145/3173386.3176976
  5. “Learning To Explore Using Active Neural SLAM” In International Conference on Learning Representations (ICLR), 2020
  6. “FILM: Following Instructions in Language with Modular Methods” In The Tenth International Conference on Learning Representations, 2022 URL: https://soyeonm.github.io/FILM_webpage/
  7. “How Does It Feel? Self-Supervised Costmap Learning for Off-Road Vehicle Traversability” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 931–938 DOI: 10.1109/ICRA48891.2023.10160856
  8. “Language Models are Few-Shot Learners”, 2020 arXiv:2005.14165 [cs.CL]
  9. “Bert: Pre-training of deep bidirectional transformers for language understanding” In arXiv preprint arXiv:1810.04805, 2018
  10. OpenAI “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
  11. “Do As I Can and Not As I Say: Grounding Language in Robotic Affordances” In arXiv preprint arXiv:2204.01691, 2022
  12. Dhruv Shah, Błażej Osiński and Sergey Levine “Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action” In Conference on Robot Learning, 2023, pp. 492–504 PMLR
  13. “Rt-1: Robotics transformer for real-world control at scale” In arXiv preprint arXiv:2212.06817, 2022
  14. “How To Not Train Your Dragon: Training-free Embodied Object Goal Navigation with Semantic Frontiers”, 2023 arXiv:2305.16925 [cs.CV]
  15. “ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation”, 2023 arXiv:2301.13166 [cs.AI]
  16. Gengze Zhou, Yicong Hong and Qi Wu “NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models”, 2023 arXiv:2305.16986 [cs.CV]
  17. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, 2023 arXiv:2201.11903 [cs.CL]
  18. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models”, 2023 arXiv:2305.10601 [cs.CL]
  19. “Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, 2023 arXiv:2305.00633 [cs.CL]
  20. “Reasoning with language model is planning with world model” In arXiv preprint arXiv:2305.14992, 2023
  21. Zirui Zhao, Wee Sun Lee and David Hsu “Large Language Models as Commonsense Knowledge for Large-Scale Task Planning” In RSS 2023 Workshop on Learning for Task and Motion Planning, 2023 URL: https://openreview.net/forum?id=tED747HURfX
  22. “Planning with Large Language Models for Code Generation” In The Eleventh International Conference on Learning Representations, 2023 URL: https://openreview.net/forum?id=Lr8cOOtYbfL
  23. “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models”, 2023 arXiv:2305.04091 [cs.CL]
  24. “Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents”, 2023 arXiv:2302.01560 [cs.AI]
  25. “Inner Monologue: Embodied Reasoning through Planning with Language Models” In arXiv preprint arXiv:2207.05608, 2022
  26. “ReAct: Synergizing Reasoning and Acting in Language Models”, 2023 arXiv:2210.03629 [cs.CL]
  27. “Improving vision-and-language navigation with image-text pairs from the web” In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, 2020, pp. 259–274 Springer
  28. Vishnu Sashank Dorbala, James F Mullen Jr and Dinesh Manocha “Can an embodied agent find your” cat-shaped mug”? llm-based zero-shot object navigation” In arXiv preprint arXiv:2303.03480, 2023
  29. B. Yamauchi “A frontier-based approach for autonomous exploration” In Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97. ’Towards New Computational Principles for Robotics and Automation’, 1997, pp. 146–151 DOI: 10.1109/CIRA.1997.613851
  30. “React: Synergizing reasoning and acting in language models” In arXiv preprint arXiv:2210.03629, 2022
  31. “VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View”, 2023 arXiv:2307.06082 [cs.AI]
  32. “Sim-to-Real Transfer for Vision-and-Language Navigation” In Proceedings of the 2020 Conference on Robot Learning 155, Proceedings of Machine Learning Research PMLR, 2021, pp. 671–681 URL: https://proceedings.mlr.press/v155/anderson21a.html
  33. “Sources of Hallucination by Large Language Models on Inference Tasks” In ArXiv abs/2305.14552, 2023
  34. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21 Virtual Event, Canada: Association for Computing Machinery, 2021, pp. 610–623
  35. “Kosmos-2: Grounding Multimodal Large Language Models to the World” In ArXiv abs/2306, 2023
  36. “AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles”, 2017 arXiv:1705.05065 [cs.RO]
Citations (6)

Summary

We haven't generated a summary for this paper yet.