A Landmark-Aware Visual Navigation Dataset (2402.14281v2)
Abstract: Map representations learned by expert demonstrations have shown promising research value. However, the field of visual navigation still faces challenges due to the lack of real-world human-navigation datasets that can support efficient, supervised, representation learning of environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exploration policies and map building. We collect RGBD observation and human point-click pairs as a human annotator explores virtual and real-world environments with the goal of full coverage exploration of the space. The human annotators also provide distinct landmark examples along each trajectory, which we intuit will simplify the task of map or graph building and localization. These human point-clicks serve as direct supervision for waypoint prediction when learning to explore in environments. Our dataset covers a wide spectrum of scenes, including rooms in indoor environments, as well as walkways outdoors. We release our dataset with detailed documentation at https://huggingface.co/datasets/visnavdataset/lavn (DOI: 10.57967/hf/2386) and a plan for long-term preservation.
- 2021. iPhone. 13 Mini. https://www.apple.com/shop/buy-iphone/iphone-13 Apple.
- Bani Anvari and Helge A Wurdemann. 2020. Modelling social interaction between humans and service robots in large public spaces. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11189–11196.
- Faster Optimization in S-Graphs Exploiting Hierarchy. arXiv preprint arXiv:2308.11242 (2023).
- S-graphs+: Real-time localization and mapping leveraging hierarchical representations. arXiv preprint arXiv:2212.11770 (2022).
- Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics 37, 6 (2021), 1874–1890.
- Bo Cao. 2018. DiffFrameNet: A Deep Learning Method for Intuitive Robot Navigation. Ph. D. Dissertation. University of Colorado at Boulder.
- Matterport3D: Learning from RGB-D Data in Indoor Environments. International Conference on 3D Vision (3DV) (2017).
- Seal: Self-supervised embodied active learning using exploration and 3d consistency. Advances in neural information processing systems 34 (2021), 13086–13098.
- Learning to explore using active neural slam. arXiv preprint arXiv:2004.05155 (2020).
- Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems 33 (2020), 4247–4258.
- Neural topological slam for visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12875–12884.
- Learning exploration policies for navigation. arXiv preprint arXiv:1903.01959 (2019).
- Human mobile robot interaction in the retail environment. Scientific Data 9, 1 (2022), 673.
- Elizabeth R Chrastil and William H Warren. 2014. From cognitive maps to cognitive graphs. PloS one 9, 11 (2014), e112544.
- Embodied question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–10.
- Alex Day and Ioannis Karamouzas. 2023. A Study in Zucker: Insights on Human-Robot Interactions. arXiv preprint arXiv:2307.08668 (2023).
- No RL, No Simulation: Learning to Navigate without Navigating. (2021).
- Metric-Free Exploration for Topological Mapping by Task and Motion Imitation in Feature Space. arXiv preprint arXiv:2303.09192 (2023).
- Hri30: An action recognition dataset for industrial human-robot interaction. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4941–4947.
- Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation. IEEE Robotics and Automation Letters 7, 4 (2022), 11807–11814.
- Topological semantic graph memory for image-goal navigation. In Conference on Robot Learning. PMLR, 393–402.
- Waypoint models for instruction-guided navigation in continuous environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15162–15171.
- One-4-All: Neural Potential Fields for Embodied Navigation. arXiv preprint arXiv:2303.04011 (2023).
- ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics 31, 5 (2015), 1147–1163.
- R Mur-Artal and JD Tardos. 2017. ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras (2016). arXiv preprint arXiv:1610.06475 (2017).
- Poni: Potential functions for objectgoal navigation with interaction-free learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18890–18900.
- Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5173–5183.
- Th”̈or: Human-robot navigation data collection and accurate motion trajectories dataset. IEEE Robotics and Automation Letters 5, 2 (2020), 676–682.
- Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- The magni human motion dataset: Accurate, complex, multi-modal, natural, semantically-rich and contextualized. arXiv preprint arXiv:2208.14925 (2022).
- Ving: Learning open-world navigation with visual goals. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13215–13222.
- Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 7226–7233.
- NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction. (2023).
- Learning social affordance for human-robot interaction. arXiv preprint arXiv:1604.03692 (2016).
- Habitat 2.0: Training Home Assistants to Rearrange their Habitat. In Advances in Neural Information Processing Systems (NeurIPS).
- Vision-and-dialog navigation. In Conference on Robot Learning. PMLR, 394–406.
- Feudal networks for hierarchical reinforcement learning. In International Conference on Machine Learning. PMLR, 3540–3549.
- HandMeThat: Human-Robot Communication in Physical and Social Environments. Advances in Neural Information Processing Systems 35 (2022), 12014–12026.
- A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments. In 2019 IEEE. In RSJ International Conference on Intelligent Robots and Systems (IROS). 5000–5007.
- Gibson env: real-world perception for embodied agents. In Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on. IEEE.
- Auxiliary tasks and exploration enable objectgoal navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16117–16126.
- Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018).
- Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 3357–3364.