Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Navigation with Language Pretraining and Stochastic Sampling (1909.02244v1)

Published 5 Sep 2019 in cs.CL, cs.CV, and cs.LG

Abstract: Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments. In this paper, we report two simple but highly effective methods to address these challenges and lead to a new state-of-the-art performance. First, we adapt large-scale pretrained LLMs to learn text representations that generalize better to previously unseen instructions. Second, we propose a stochastic sampling scheme to reduce the considerable gap between the expert actions in training and sampled actions in test, so that the agent can learn to correct its own mistakes during long sequential action decoding. Combining the two techniques, we achieve a new state of the art on the Room-to-Room benchmark with 6% absolute gain over the previous best result (47% -> 53%) on the Success Rate weighted by Path Length metric.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xiujun Li (37 papers)
  2. Chunyuan Li (122 papers)
  3. Qiaolin Xia (7 papers)
  4. Yonatan Bisk (91 papers)
  5. Asli Celikyilmaz (80 papers)
  6. Jianfeng Gao (344 papers)
  7. Noah Smith (10 papers)
  8. Yejin Choi (287 papers)
Citations (105)
Github Logo Streamline Icon: https://streamlinehq.com