Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation (2402.17587v3)

Published 25 Feb 2024 in cs.CV and cs.RO

Abstract: As a new embodied vision task, Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment. The main challenge of this task lies in identifying the target object from different viewpoints while rejecting similar distractors. Existing ImageGoal Navigation methods usually adopt the simple Exploration-Exploitation framework and ignore the identification of specific instance during navigation. In this work, we propose to imitate the human behaviour of ``getting closer to confirm" when distinguishing objects from a distance. Specifically, we design a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation. Our method allows for active switching among the exploration, verification, and exploitation actions, thereby facilitating the agent in making reasonable decisions under different situations. On the challenging HabitatMatterport 3D semantic (HM3D-SEM) dataset, our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Zero experience required: Plug & play modular transfer learning for semantic visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17031–17041, 2022.
  2. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757, 2018a.
  3. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3674–3683, 2018b.
  4. Objectnav revisited: On evaluation of embodied agents navigating to objects. arXiv preprint arXiv:2006.13171, 2020.
  5. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9650–9660, 2021.
  6. Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017.
  7. Object goal navigation using goal-oriented semantic exploration. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2020a.
  8. Learning to explore using active neural slam. arXiv preprint arXiv:2004.05155, 2020b.
  9. Neural topological slam for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020c.
  10. Image-goal navigation via keypoint-based reinforcement learning. In Proceedings of the International Conference on Ubiquitous Robots (UR), pages 18–21, 2021.
  11. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
  12. Visual object search by learning spatial context. IEEE Robotics and Automation Letters, 5(2):1279–1286, 2020.
  13. Learning object relation graph and tentative policy for visual navigation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34. Springer, 2020.
  14. Object-goal visual navigation via effective exploration of relations among historical navigation states. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2563–2573, 2023.
  15. Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23171–23181, 2023.
  16. Adaptive zone-aware hierarchical planner for vision-language navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14911–14920, 2023.
  17. No rl, no simulation: Learning to navigate without navigating. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pages 26661–26673, 2021.
  18. Mask r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2961–2969, 2017.
  19. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, 2022.
  20. Geovln: Learning geometry-enhanced visual representation with slot attention for vision-and-language navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23212–23221, 2023.
  21. Meta-explore: Exploratory hierarchical vision-and-language navigation using scene object spectrum grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6683–6693, 2023.
  22. A new path: Scaling vision-and-language navigation with synthetic instructions and imitation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10813–10823, 2023.
  23. Topological Semantic Graph Memory for Image Goal Navigation. In Proceedings of the Conference on Robot Learning (CoRL), 2022.
  24. Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017.
  25. Beyond the nav-graph: Vision-and-language navigation in continuous environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 104–120, 2020.
  26. Instance-specific image goal navigation: Training embodied agents to find object instances. arXiv preprint arXiv:2211.15876, 2022.
  27. Iterative vision-and-language navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14921–14930, 2023a.
  28. Navigating to objects specified by images. arXiv preprint arXiv:2304.01192, 2023b.
  29. Renderable neural radiance map for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9099–9108, 2023.
  30. Improving vision-and-language navigation by generating future-view image semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10803–10812, 2023.
  31. Lightglue: Local feature matching at light speed. arXiv preprint arXiv:2306.13643, 2023.
  32. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2023.
  33. Zson: Zero-shot object-goal navigation using multimodal goal embeddings. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pages 32340–32352, 2022.
  34. Thda: Treasure hunt data augmentation for semantic navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15374–15383, 2021.
  35. Memory-augmented reinforcement learning for image-goal navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3316–3323, 2022.
  36. Optimistic agent: Accurate graph-based value estimation for more successful visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3733–3742, 2021.
  37. Visual representations for semantic target driven navigation. In Proceedings of the IEEE international conference on robotics and automation (ICRA), pages 8846–8852, 2019.
  38. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  39. Learning hierarchical relationships for object-goal navigation. In Proceedings of the Conference on Robot Learning (CoRL), pages 517–528, 2021.
  40. Is mapping necessary for realistic pointgoal navigation? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17232–17241, 2022.
  41. Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. arXiv preprint arXiv:2109.08238, 2021.
  42. Poni: Potential functions for objectgoal navigation with interaction-free learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  43. Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5173–5183, 2022.
  44. Semi-parametric topological memory for navigation. arXiv preprint arXiv:1803.00653, 2018.
  45. Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931, 2017.
  46. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9339–9347, 2019.
  47. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  48. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  49. Habitat 2.0: Training home assistants to rearrange their habitat. Advances in Neural Information Processing Systems (NIPS), 34:251–266, 2021.
  50. Disk: Learning local features with policy gradient. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), pages 14254–14265, 2020.
  51. Internimage: Exploring large-scale vision foundation models with deformable convolutions. arxiv 2022. arXiv preprint arXiv:2211.05778, 2023.
  52. Last-mile embodied visual navigation. In Proceedings of the Conference on Robot Learning (CoRL), 2022.
  53. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9068–9079, 2018.
  54. Habitat-matterport 3d semantics dataset. arXiv preprint arXiv:2210.05633, 2022.
  55. Ovrl-v2: A simple state-of-art baseline for imagenav and objectnav. arXiv preprint arXiv:2303.07798, 2023a.
  56. Offline visual representation learning for embodied navigation. In Proceedings of the Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023b.
  57. Habitat-matterport 3d semantics dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4927–4936, 2023c.
  58. Visual semantic navigation using scene priors. arXiv preprint arXiv:1810.06543, 2018.
  59. Behavioral analysis of vision-and-language navigation agents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2574–2582, 2023.
  60. Auxiliary tasks and exploration enable objectgoal navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16117–16126, 2021.
  61. 3d-aware object goal navigation via simultaneous exploration and identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6672–6682, 2023.
  62. Hierarchical object-to-zone graph for object navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15130–15140, 2021.
  63. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the IEEE international conference on robotics and automation (ICRA), pages 3357–3364, 2017.
Citations (3)

Summary

We haven't generated a summary for this paper yet.