Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LangNav: Language as a Perceptual Representation for Navigation (2310.07889v2)

Published 11 Oct 2023 in cs.CV, cs.AI, cs.CL, and cs.RO

Abstract: We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained LLM to select an action, based on the current view and the trajectory history, that would best fulfill the navigation instructions. In contrast to the standard setup which adapts a pretrained LLM to work directly with continuous visual features from pretrained vision models, our approach instead uses (discrete) language as the perceptual representation. We explore several use cases of our language-based navigation (LangNav) approach on the R2R VLN benchmark: generating synthetic trajectories from a prompted LLM (GPT-4) with which to finetune a smaller LLM; domain transfer where we transfer a policy learned on one simulated environment (ALFRED) to another (more realistic) environment (R2R); and combining both vision- and language-based representations for VLN. Our approach is found to improve upon baselines that rely on visual features in settings where only a few expert trajectories (10-100) are available, demonstrating the potential of language as a perceptual representation for navigation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Bowen Pan (16 papers)
  2. Rameswar Panda (79 papers)
  3. SouYoung Jin (11 papers)
  4. Rogerio Feris (105 papers)
  5. Aude Oliva (42 papers)
  6. Phillip Isola (84 papers)
  7. Yoon Kim (92 papers)
Citations (12)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com