HomeRobot: Open-Vocabulary Mobile Manipulation (2306.11565v2)
Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.
- Objectnav revisited: On evaluation of embodied agents navigating to objects. arXiv, 2020.
- Navigating to objects in the real world. arXiv, 2022.
- Robocup@ home: Scientific competition and benchmarking for domestic service robots. Interaction Studies, 2009.
- Towards autonomous robotic butlers: Lessons learned with the pr2. In ICRA, 2011.
- Experiences with an interactive museum tour-guide robot. Artificial intelligence, 1999.
- Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In CVPR, 2020.
- Virtualhome: Simulating household activities via programs. In CVPR, 2018.
- Integrated task and motion planning. Annual Review of Control, Robotics, and Autonomous Systems, 4:265–293, 2021.
- Open-vocabulary queryable scene representations for real world planning. arXiv, 2022.
- Open-world object manipulation using pre-trained vision-language models. arXiv, 2023.
- Clip-fields: Weakly supervised semantic fields for robotic memory. arXiv, 2022.
- Usa-net: Unified semantic and affordance representations for robot memory. arXiv, 2023.
- Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. In ICRA, 2022.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Conceptfusion: Open-set multimodal 3d mapping. arXiv, 2023.
- Beyond the nav-graph: Vision and language navigation in continuous environments. In European Conference on Computer Vision (ECCV), 2020.
- Structdiffusion: Object-centric diffusion for semantic rearrangement of novel objects. arXiv, 2022.
- Palm-e: An embodied multimodal language model. arXiv, 2023.
- Habitat Synthetic Scenes Dataset: An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation. arXiv, 2023.
- Habitat: A Platform for Embodied AI Research. ICCV, 2019.
- Habitat 2.0: Training home assistants to rearrange their habitat. In NeurIPS, 2021.
- The design of stretch: A compact, lightweight mobile manipulator for indoor human environments. In ICRA, 2022.
- Homerobot open vocab mobile manipulation challenge 2023. https://aihabitat.org/challenge/2023_homerobot_ovmm/, 2023.
- Spatial-language attention policies for efficient robot learning. arXiv, 2023.
- Navigating to objects specified by images. arXiv, 2023.
- Detecting twenty-thousand classes using image-level supervision. In ECCV, 2022.
- Visual room rearrangement. In CVPR, 2021.
- Habitat challenge 2023. https://aihabitat.org/challenge/2023/, 2023.
- Threedworld: A platform for interactive multi-modal physical simulation. NeurIPS Datasets and Benchmarks Track, 2021.
- Procthor: Large-scale embodied ai using procedural generation. In NeurIPS, 2022.
- RoboTHOR: An Open Simulation-to-Real Embodied AI Platform. In CVPR, 2020.
- Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In CoRL, 2023.
- Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations. In NeurIPS Datasets and Benchmarks Track, 2021.
- The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Challenge Finals: Humanoid Robots To The Rescue, 2018.
- Unmanned vehicles come of age: The darpa grand challenge. Computer, 2006.
- The DARPA urban challenge: autonomous vehicles in city traffic. Springer Berlin, Heidelberg, 2009.
- Analysis and observations from the first amazon picking challenge. IEEE Transactions on Automation Science and Engineering, 2016.
- The darpa lagr program: Goals, challenges, methodology, and phase i results. Journal of Field Robotics, 2006.
- M. Müller and V. Koltun. Openbot: Turning smartphones into robots. In ICRA, 2021.
- Stanford doggo: An open-source, quasi-direct-drive quadruped. In ICRA, 2019.
- An open torque-controlled modular robot architecture for legged locomotion research. IEEE Robotics and Automation Letters, 2019.
- Replab: A reproducible low-cost arm benchmark platform for robotic learning. arXiv, 2019.
- Quasi-direct drive for low-cost compliant robotic manipulation. In ICRA, 2019.
- Trifinger: An open-source robot for learning dexterity. In CoRL, 2020.
- ROBEL: RObotics BEnchmarks for Learning with low-cost robots. In CoRL, 2019.
- Pyrobot: An open-source robotics framework for research and benchmarking. arXiv, 2019.
- Duckietown: An open, inexpensive and flexible platform for autonomy education and research. In ICRA, 2017.
- The ycb object and model set: Towards common benchmarks for manipulation research. In ICRA, 2015.
- Egad! an evolved grasping analysis dataset for diversity and reproducibility in robotic manipulation. IEEE Robotics and Automation Letters, 2020.
- Benchmarking robot manipulation with the rubik’s cube. IEEE Robotics and Automation Letters, 2020.
- Rb2: Robotic manipulation benchmarking with a twist. arXiv, 2022.
- Train offline, test online: A real robot learning benchmark. arXiv, 2022.
- Benchmarking protocols for evaluating small parts robotic assembly systems. IEEE Robotics and Automation Letters, 2020.
- Benchmarking off-the-shelf solutions to robotic assembly tasks. In IROS, 2021.
- Are we making real progress in simulated environments? measuring the sim2real gap in embodied visual navigation. arXiv, 2019.
- AI2-THOR: an interactive 3d environment for visual AI. arXiv, 2017.
- Abo: Dataset and benchmarks for real-world 3d object understanding. In CVPR, 2022.
- Google scanned objects: A high-quality dataset of 3d scanned household items. In ICRA, 2022.
- Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. In ICLR, 2019.
- Evaluating continual learning on a home robot, 2023.
- Tidybot: Personalized robot assistance with large language models. arXiv, 2023.
- Stereo magnification: Learning view synthesis using multiplane images. SIGGRAPH, 2018.
- Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In ICRA, 2021.
- 6-dof grasping for target-driven object manipulation in clutter. In ICRA, 2020.
- Graspnet-1billion: A large-scale benchmark for general object grasping. In CVPR, 2020.
- Online replanning in belief space for partially observable task and motion problems. In ICRA, 2020.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR, 2019.
- Piqa: Reasoning about physical commonsense in natural language. In AAAI, 2020.
- SocialIQA: Commonsense reasoning about social interactions. In EMNLP, 2019.
- Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.
- Winogrande: An adversarial winograd schema challenge at scale. In AAAI, 2019.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Squad: 100,000+ questions for machine comprehension of text. In EMNLP, 2016.
- L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In ICRA, 2016.
- Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. IJRR, 2018.
- Do as i can and not as i say: Grounding language in robotic affordances. In CoRL, 2022.
- Roboturk: A crowdsourcing platform for robotic skill learning through imitation. In CoRL, 2018.
- Multiple interactions made easy (mime): Large scale demonstrations data for imitation. In CoRL, 2018.
- Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv, 2017.
- Robonet: Large-scale multi-robot learning. arXiv, 2019.
- Robot learning in homes: Improving generalization and reducing dataset bias. In NeurIPS, 2018.
- Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In CVPR, 2017.
- H. Team. Habitat CVPR challenge, 2019. URL https://aihabitat.org/challenge/2019/.
- Interactive gibson benchmark: A benchmark for interactive navigation in cluttered environments. IEEE Robotics and Automation Letters, 2020.
- Soundspaces: Audio-visual navigation in 3d environments. In ECCV, 2020.
- Manipulathor: A framework for visual object manipulation. In CVPR, 2021.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In CoRL, 2019.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 2020.
- Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding. In EMNLP, 2020.
- TEACh: Task-driven embodied agents that chat. In AAAI, 2022.
- Dialfred: Dialogue-enabled agents for embodied instruction following. IEEE Robotics and Automation Letters, 2022.
- Habitat rearrangement challenge. https://aihabitat.org/challenge/2022_rearrange, 2022.
- Simulation of parallel-jaw grasping using incremental potential contact models. In ICRA, 2022.
- The robotic vision challenges. https://nikosuenderhauf.github.io/roboticvisionchallenges/cvpr2022, 2022.
- 6-dof graspnet: Variational grasp generation for object manipulation. In ICCV, 2019.
- Predicting stable configurations for semantic placement of novel objects. In CoRL, 2022.
- A flexible and scalable slam system with full 3d motion estimation. In Proc. IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR). IEEE, November 2011.
- Rrt-connect: An efficient approach to single-query path planning. In ICRA, 2000.
- Object goal navigation using goal-oriented semantic exploration. In NeurIPS, 2020.
- B. Yamauchi. A frontier-based approach for autonomous exploration. In IEEE International Symposium on Computational Intelligence in Robotics and Automation, 1997.
- J. A. Sethian. Fast marching methods. SIAM review, 1999.
- Learning to explore using active neural mapping. ICLR, 2020.
- Ros: an open-source robot operating system. In ICRA Workshop on Open Source Software, 2009.
- Sriram Yenamandra (9 papers)
- Arun Ramachandran (4 papers)
- Karmesh Yadav (16 papers)
- Austin Wang (15 papers)
- Mukul Khanna (8 papers)
- Theophile Gervet (13 papers)
- Tsung-Yen Yang (13 papers)
- Vidhi Jain (12 papers)
- Alexander William Clegg (3 papers)
- John Turner (7 papers)
- Zsolt Kira (110 papers)
- Manolis Savva (64 papers)
- Angel Chang (5 papers)
- Devendra Singh Chaplot (37 papers)
- Dhruv Batra (160 papers)
- Roozbeh Mottaghi (66 papers)
- Yonatan Bisk (91 papers)
- Chris Paxton (59 papers)