Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

A Landmark-Aware Visual Navigation Dataset (2402.14281v2)

Published 22 Feb 2024 in cs.CV

Abstract: Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We release our dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: 10.57967/hf/2386) and a plan for long-term preservation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. 2021. iPhone. 13 Mini. https://www.apple.com/shop/buy-iphone/iphone-13 Apple.
  2. Bani Anvari and Helge A Wurdemann. 2020. Modelling social interaction between humans and service robots in large public spaces. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11189–11196.
  3. Faster Optimization in S-Graphs Exploiting Hierarchy. arXiv preprint arXiv:2308.11242 (2023).
  4. S-graphs+: Real-time localization and mapping leveraging hierarchical representations. arXiv preprint arXiv:2212.11770 (2022).
  5. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics 37, 6 (2021), 1874–1890.
  6. Bo Cao. 2018. DiffFrameNet: A Deep Learning Method for Intuitive Robot Navigation. Ph. D. Dissertation. University of Colorado at Boulder.
  7. Matterport3D: Learning from RGB-D Data in Indoor Environments. International Conference on 3D Vision (3DV) (2017).
  8. Seal: Self-supervised embodied active learning using exploration and 3d consistency. Advances in neural information processing systems 34 (2021), 13086–13098.
  9. Learning to explore using active neural slam. arXiv preprint arXiv:2004.05155 (2020).
  10. Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems 33 (2020), 4247–4258.
  11. Neural topological slam for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12875–12884.
  12. Learning exploration policies for navigation. arXiv preprint arXiv:1903.01959 (2019).
  13. Human mobile robot interaction in the retail environment. Scientific Data 9, 1 (2022), 673.
  14. Elizabeth R Chrastil and William H Warren. 2014. From cognitive maps to cognitive graphs. PloS one 9, 11 (2014), e112544.
  15. Embodied question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–10.
  16. Alex Day and Ioannis Karamouzas. 2023. A Study in Zucker: Insights on Human-Robot Interactions. arXiv preprint arXiv:2307.08668 (2023).
  17. No RL, No Simulation: Learning to Navigate without Navigating. (2021).
  18. Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space. arXiv preprint arXiv:2303.09192 (2023).
  19. Hri30: An action recognition dataset for industrial human-robot interaction. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4941–4947.
  20. Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation. IEEE Robotics and Automation Letters 7, 4 (2022), 11807–11814.
  21. Topological semantic graph memory for image-goal navigation. In Conference on Robot Learning. PMLR, 393–402.
  22. Waypoint models for instruction-guided navigation in continuous environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15162–15171.
  23. One-4-All: Neural Potential Fields for Embodied Navigation. arXiv preprint arXiv:2303.04011 (2023).
  24. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics 31, 5 (2015), 1147–1163.
  25. R Mur-Artal and JD Tardos. 2017. ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras (2016). arXiv preprint arXiv:1610.06475 (2017).
  26. Poni: Potential functions for objectgoal navigation with interaction-free learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18890–18900.
  27. Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5173–5183.
  28. Th”̈or: Human-robot navigation data collection and accurate motion trajectories dataset. IEEE Robotics and Automation Letters 5, 2 (2020), 676–682.
  29. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
  30. The magni human motion dataset: Accurate, complex, multi-modal, natural, semantically-rich and contextualized. arXiv preprint arXiv:2208.14925 (2022).
  31. Ving: Learning open-world navigation with visual goals. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13215–13222.
  32. Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 7226–7233.
  33. NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction. (2023).
  34. Learning social affordance for human-robot interaction. arXiv preprint arXiv:1604.03692 (2016).
  35. Habitat 2.0: Training Home Assistants to Rearrange their Habitat. In Advances in Neural Information Processing Systems (NeurIPS).
  36. Vision-and-dialog navigation. In Conference on Robot Learning. PMLR, 394–406.
  37. Feudal networks for hierarchical reinforcement learning. In International Conference on Machine Learning. PMLR, 3540–3549.
  38. HandMeThat: Human-Robot Communication in Physical and Social Environments. Advances in Neural Information Processing Systems 35 (2022), 12014–12026.
  39. A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments. In 2019 IEEE. In RSJ International Conference on Intelligent Robots and Systems (IROS). 5000–5007.
  40. Gibson env: real-world perception for embodied agents. In Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on. IEEE.
  41. Auxiliary tasks and exploration enable objectgoal navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16117–16126.
  42. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018).
  43. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 3357–3364.

Summary

  • The paper presents a novel dataset that integrates human point-click pairs and landmark annotations to train effective exploration policies.
  • It leverages RGBD image observations and detailed waypoint data from both indoor and outdoor scenes to simplify map-building and localization.
  • The comprehensive design of the dataset encourages the development of innovative navigation strategies that closely mimic human exploration cues.

A Comprehensive Overview of the Landmark-Aware Visual Navigation Dataset

Introduction to the LAVN Dataset

Within the landscape of robotic navigation research, the introduction of the Landmark-Aware Visual Navigation (LAVN) dataset marks a significant stride towards bridging the gap between human-guided exploration strategies and their robotic counterparts. This dataset, which encompasses both virtual and real-world environments, aims to facilitate the supervised learning of exploration policies and map-building processes through human-centric annotations. It underscores a concerted effort to address the challenges associated with visual navigation by providing a structured framework that incorporates human point-click pairs and landmark identification into the learning models of autonomous agents.

Dataset Characteristics

The LAVN dataset is distinct from its predecessors in several key aspects. Principally, it:

  • Incorporates human point-click pairs as direct supervision for waypoint prediction, guiding agents through environments with the intent of full coverage exploration.
  • Introduces landmark annotations within each trajectory, anticipated to simplify the tasks of map or graph building and localization significantly.
  • Covers a broad spectrum of both indoor and outdoor scenes, providing a rich variety of scenarios for model training and evaluation.

Contrary to existing datasets that may focus on singular aspects of visual navigation, such as semantics or large-scale environmental captures, the LAVN dataset’s emphasis on landmark-aware exploration strategies presents a unique opportunity for research in this domain.

Methodological Approach

The dataset generation involved meticulously structured human navigation through both simulated and real-world environments, capturing RGBD image observations alongside human point-click pairs. This method allowed for a detailed collection of navigation waypoints and distinctive landmarks within every captured trajectory. Furthermore, the dataset encapsulates a variety of environments, ranging from indoor settings like rooms and hallways to outdoor scenarios including walkways and campuses, thereby ensuring a comprehensive scope for model training.

Implications and Potential Applications

The LAVN dataset is poised to significantly impact several fronts in AI and robotics:

  • Model Training: It offers unprecedented resources for training visual navigation models under supervised conditions, potentially enhancing exploration efficiency and accuracy.
  • Exploration Strategies: The inclusion of human annotator-provided landmarks and waypoints could inspire novel exploration algorithm development, leveraging human-like navigation principles.
  • Map and Graph Building: By facilitating landmark-aware map representation learning, it addresses existing challenges in spatial understanding and localization tasks crucial for autonomous navigation.

Concluding Remarks

In summary, the LAVN dataset provides a valuable resource for advancing research in visual navigation, particularly in developing models that closely emulate human exploration strategies through landmark recognition and utilization. Its comprehensive coverage, consisting of both virtual and real-world environments, alongside the innovative inclusion of landmark annotations, sets a new precedent for dataset capabilities in this field. Looking forward, the LAVN dataset not only offers immediate opportunities for enriching visual navigation models but also lays groundwork for future developments, potentially steering the research community towards more effective and efficient autonomous exploration methodologies.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.