Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation (2404.01943v1)

Published 2 Apr 2024 in cs.CV and cs.RO

Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Model learning for look-ahead exploration in continuous control. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3151–3158, 2019.
  2. 1st place solutions for rxr-habitat vision-and-language navigation competition (cvpr 2022). arXiv preprint arXiv:2206.11610, 2022.
  3. Bevbert: Multimodal map pre-training for language-guided navigation. In ICCV, pages 2737–2748, 2023a.
  4. Etpnav: Evolving topological planning for vision-language navigation in continuous environments. arXiv preprint arXiv:2304.03047, 2023b.
  5. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In CVPR, pages 3674–3683, 2018.
  6. Matterport3d: Learning from rgb-d data in indoor environments. In 3DV, pages 667–676, 2017.
  7. Weakly-supervised multi-granularity map learning for vision-and-language navigation. In NeurIPS, 2022a.
  8. History aware multimodal transformer for vision-and-language navigation. In NeurIPS, pages 5834–5847, 2021.
  9. Think global, act local: Dual-scale graph transformer for vision-and-language navigation. In CVPR, pages 16537–16547, 2022b.
  10. Unconstrained scene generation with locally conditioned radiance fields. In ICCV, pages 14304–14313, 2021.
  11. Uln: Towards underspecified vision-and-language navigation. arXiv preprint arXiv:2210.10020, 2022.
  12. Speaker-follower models for vision-and-language navigation. In NeurIPS, 2018.
  13. Cross-modal map learning for vision and language navigation. In CVPR, 2022.
  14. GEASI: Geodesic-based earliest activation sites identification in cardiac models. International Journal for Numerical Methods in Biomedical Engineering, 37(8):e3505, 2021.
  15. Vln bert: A recurrent vision-and-language bert for navigation. In CVPR, pages 1643–1653, 2021.
  16. Bridging the gap between learning in discrete and continuous environments for vision-and-language navigation. In CVPR, 2022.
  17. Learning navigational visual representations with semantic map supervision. In ICCV, pages 3055–3067, 2023.
  18. Visual language maps for robot navigation. In ICRA, London, UK, 2023.
  19. Tactical rewind: Self-correction via backtracking in vision-and-language navigation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6741–6749, 2019.
  20. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
  21. Pathdreamer: A world model for indoor navigation. In ICCV, pages 14738–14748, 2021.
  22. Sim-2-sim transfer for vision-and-language navigation in continuous environments. In ECCV, 2022.
  23. Beyond the nav-graph: Vision-and-language navigation in continuous environments. In ECCV, 2020.
  24. Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding. In EMNLP, pages 4392–4412, 2020.
  25. Renderable neural radiance map for visual navigation. In CVPR, pages 9099–9108, 2023.
  26. Improving vision-and-language navigation by generating future-view image semantics. In CVPR, pages 10803–10812, 2023.
  27. Kerm: Knowledge enhanced reasoning for vision-and-language navigation. In CVPR, pages 2583–2592, 2023.
  28. Bird’s-eye-view scene graph for vision-language navigation. In ICCV, pages 10968–10980, 2023.
  29. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  30. Episodic transformer for vision-and-language navigation. In ICCV, 2021.
  31. Scikit-learn: Machine learning in python. Journal of machine Learning research, 12:2825–2830, 2011.
  32. Reverie: Remote embodied visual referring expression in real indoor environments. In CVPR, pages 9982–9991, 2020.
  33. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  34. Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. arXiv preprint arXiv:2109.08238, 2021.
  35. Zero-shot text-to-image generation. In ICML, pages 8821–8831. PMLR, 2021.
  36. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings, pages 1–10, 2022.
  37. Distilled feature fields enable few-shot language-guided manipulation. In Proceedings of The 7th Conference on Robot Learning, pages 405–424, 2023.
  38. Language-enhanced rnr-map: Querying renderable neural radiance field maps with natural language. In ICCV, pages 4669–4674, 2023.
  39. Learning to navigate unseen environments: Back translation with environmental dropout. In NAACL, pages 2610–2621, 2019.
  40. Vision-and-dialog navigation. In PMLR, 2020.
  41. Active visual information gathering for vision-language navigation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 307–322. Springer, 2020.
  42. Dreamwalker: Mental planning for continuous vision-language navigation. In ICCV, pages 10873–10883, 2023a.
  43. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In CVPR, pages 6629–6638, 2019.
  44. Generating explanations for embodied action decision from visual observation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2838–2846, 2023b.
  45. Camp: Causal multi-policy planning for interactive navigation in multi-room scenes. Advances in Neural Information Processing Systems, 36, 2024.
  46. Scaling data generation in vision-and-language navigation. In ICCV, pages 12009–12020, 2023c.
  47. Gridmm: Grid memory map for vision-and-language navigation. In ICCV, pages 15625–15636, 2023d.
  48. Hierarchical object-to-zone graph for object navigation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 15110–15120. IEEE, 2021.
  49. Generative meta-adversarial network for unseen object navigation. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXIX, pages 301–320, 2022.
  50. Layout-based causal inference for object navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10792–10802, 2023.
  51. Soon: Scenario oriented object navigation with graph-based exploration. In CVPR, pages 12689–12699, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zihan Wang (181 papers)
  2. Xiangyang Li (58 papers)
  3. Jiahao Yang (25 papers)
  4. Yeqi Liu (5 papers)
  5. Junjie Hu (111 papers)
  6. Ming Jiang (59 papers)
  7. Shuqiang Jiang (30 papers)
Citations (8)
X Twitter Logo Streamline Icon: https://streamlinehq.com