Visual Semantic Navigation with Real Robots (2311.16623v2)
Abstract: Visual Semantic Navigation (VSN) is the ability of a robot to learn visual semantic information for navigating in unseen environments. These VSN models are typically tested in those virtual environments where they are trained, mainly using reinforcement learning based approaches. Therefore, we do not yet have an in-depth analysis of how these models would behave in the real world. In this work, we propose a new solution to integrate VSN models into real robots, so that we have true embodied agents. We also release a novel ROS-based framework for VSN, ROS4VSN, so that any VSN-model can be easily deployed in any ROS-compatible robot and tested in a real setting. Our experiments with two different robots, where we have embedded two state-of-the-art VSN agents, confirm that there is a noticeable performance difference of these VSN solutions when tested in real-world and simulation environments. We hope that this research will endeavor to provide a foundation for addressing this consequential issue, with the ultimate aim of advancing the performance and efficiency of embodied agents within authentic real-world scenarios. Code to reproduce all our experiments can be found at https://github.com/gramuah/ros4vsn.
- A survey of state-of-the-art on visual slam. Expert Systems with Applications 205, 117734.
- Legged Locomotion in Challenging Terrains using Egocentric Vision, in: CoRL.
- ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects, in: arXiv. doi:https://arxiv.org/abs/2006.13171, arXiv:2006.13171.
- Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics .
- Semantic Visual Navigation by Watching Youtube Videos, in: NeurIPS.
- Object Goal Navigation using Goal-Oriented Semantic Exploration, in: NeurIPS.
- Navigating to Objects in the Real World. Science Robotics .
- Collision anticipation via deep reinforcement learning for visual navigation, in: IbPRIA.
- Mask R-CNN, in: ICCV.
- Grounded decoding: Guiding text generation with grounded models for robot control. ArXiv .
- Learning agile and dynamic motor skills for legged robots. Science Robotics .
- “focusing on the right regions” — guided saliency prediction for visual slam. Expert Systems with Applications 213, 119068.
- Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. The International Journal of Robotics Research 30, 407–430. URL: https://doi.org/10.1177/0278364910388963, doi:10.1177/0278364910388963, arXiv:https://doi.org/10.1177/0278364910388963.
- Sim2Real predictivity: Does evaluation in simulation predict real-world performance? IEEE Robotics and Automation Letters .
- Simple but Effective: CLIP Embeddings for Embodied AI, in: CVPR.
- Monocular vision-based time-to-collision estimation for small drones by domain adaptation of simulated images. Expert Systems with Applications 199, 116973.
- Multi-goal audio-visual navigation using sound direction map. ArXiv .
- Multi-session visual slam for illumination-invariant re-localization in indoor environments. Frontiers in Robotics and AI .
- Multi-agent embodied visual semantic navigation with scene prior knowledge. IEEE Robotics and Automation Letters 7, 3154–3161. doi:10.1109/LRA.2022.3145964.
- ROS wrapper for Kobuki base Turtlebot 2. URL: https://github.com/yujinrobot/kobuki.git.
- ROS wrapper for Astra camera. URL: https://github.com/orbbec/ros_astra_camera.
- A few shot adaptation of visual navigation skills to new observations using meta-learning, in: ICRA, pp. 13231–13237.
- Visual representations for semantic target driven navigation. ICRA .
- Assistive robot with an ai-based application for the reinforcement of activities of daily living: Technical validation with users affected by neurodevelopmental disorders. Applied Sciences .
- KinectFusion: Real-time dense surface mapping and tracking, in: International Symposium on Mixed and Augmented Reality.
- ROS: an open-source robot operating system, in: ICRA, Workshop on Open Source Robotics.
- Habitat-Matterport 3D Dataset (HM3D): 1000 large-scale 3D environments for embodied AI, in: NeurIPS.
- PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav, in: CVPR.
- Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale, in: CVPR.
- Kimera: an open-source library for real-time metric-semantic localization and mapping. ICRA .
- CAD2RL: Real Single-Image Flight Without a Single Real Image, in: Robotics: Science and Systems.
- Benchmarking 6dof outdoor visual localization in changing conditions, in: CVPR.
- A fast marching level set method for monotonically advancing fronts, in: Proceedings of the National Academy of Sciences.
- Offline reinforcement learning for visual navigation, in: CoRL.
- Sim-to-real transfer of bolting tasks with tight tolerance, in: IROS.
- Habitat 2.0: Training home assistants to rearrange their habitat, in: NeurIPS.
- Robust Monte Carlo localization for mobile robots. Artificial Intelligence 128, 99–141. doi:10.1016/S0004-3702(01)00069-8.
- Learning semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. IEEE Robotics and Automation Letters .
- DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames, in: ICLR.
- DD-PPO: Learning near-perfect pointgoal navigators from 2.5 billion frames, in: ICLR.
- Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning. CVPR , 6743–6752doi:10.1109/cvpr.2019.00691.
- Ovrl-v2: A simple state-of-art baseline for imagenav and objectnav. ArXiv .
- Offline visual representation learning for embodied navigation, in: ICLR.
- Visual semantic navigation using scene priors. ICLR .
- Auxiliary tasks and exploration enable ObjectGoal navigation, in: ICCV.
- Semantic slam based on object detection and improved octomap. IEEE Access .
- Generative meta-adversarial network for unseen object navigation, in: ECCV.
- ESC: Exploration with soft commonsense constraints for zero-shot object navigation. ArXiv .
- Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, in: ICLR.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.