Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ground then Navigate: Language-guided Navigation in Dynamic Scenes (2209.11972v1)

Published 24 Sep 2022 in cs.CV

Abstract: We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, which pose this task as a node selection problem, given a discrete connected graph corresponding to the environment. We do not assume the availability of such a discretised map. Our work moves towards continuity in action space, provides interpretability through visual feedback and allows VLN on commands requiring finer manoeuvres like "park between the two cars". Furthermore, we propose a novel meta-dataset CARLA-NAV to allow efficient training and validation. The dataset comprises pre-recorded training sequences and a live environment for validation and testing. We provide extensive qualitative and quantitive empirical results to validate the efficacy of the proposed approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kanishk Jain (9 papers)
  2. Varun Chhangani (1 paper)
  3. Amogh Tiwari (2 papers)
  4. K. Madhava Krishna (124 papers)
  5. Vineet Gandhi (41 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.