HomeRobot: Open-Vocabulary Mobile Manipulation (2306.11565v2)

Published 20 Jun 2023 in cs.RO, cs.AI, and cs.CV

Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.

References (104)

Authors (18)

Sriram Yenamandra (9 papers)
Arun Ramachandran (4 papers)
Karmesh Yadav (16 papers)
Austin Wang (15 papers)
Mukul Khanna (8 papers)
Theophile Gervet (13 papers)
Tsung-Yen Yang (13 papers)
Vidhi Jain (12 papers)
Alexander William Clegg (3 papers)
John Turner (7 papers)
Zsolt Kira (110 papers)
Manolis Savva (64 papers)
Angel Chang (5 papers)
Devendra Singh Chaplot (37 papers)
Dhruv Batra (160 papers)
Roozbeh Mottaghi (66 papers)
Yonatan Bisk (91 papers)
Chris Paxton (59 papers)

Citations (61)

View on Semantic Scholar

Summary

Open-Vocabulary Mobile Manipulation: A Comprehensive Exploration

The paper "HomeRobot: Open-Vocabulary Mobile Manipulation" presents a detailed approach to tackling significant challenges in robotics, particularly in the area of Open-Vocabulary Mobile Manipulation (OVMM). This research addresses the integration of perception, language understanding, navigation, and manipulation, all essential sub-components for creating effective household robotic assistants. This paper introduces the HomeRobot OVMM benchmark, a platform designed to evaluate mobile manipulation in both simulated and real-world environments.

Benchmark Design and Components

The HomeRobot OVMM benchmark has two primary elements: a simulation component and a real-world component. The simulation utilizes an extensive dataset, comprising 200 human-authored 3D scenes within AI Habitat, to present diverse multi-room environments populated with a wide range of objects. This environment is used to create multi-room OVMM challenges, helping bridge sim-to-real transfer barriers.

The real-world component employs the Hello Robot Stretch platform equipped with a software stack to enhance reproducibility across labs. This component is designed with sim-to-real transfer in mind, showing baselines achieving a 20% success rate in real-world tests.

Methodology and Baseline Implementations

The paper provides both heuristic and reinforcement learning (RL) methods as baseline agents. The heuristic approach uses a motion planner integrated with a vision-based object detector, DETIC. This method excels in long-horizon navigation tasks. Conversely, the RL approach demonstrates superior navigation efficiency when visible objects are present. The integration tests reveal a significant performance drop when switching from ground-truth perception to DETIC-based perception, underlining the importance of integrated learning systems for improving home assistant functionality.

Numerical Results and Task Performance

Significant experimental results detail success rates across various sub-tasks within the OVMM framework. The baselines demonstrate potential but also highlight the challenges posed by perception inaccuracies, particularly with DETIC predictions. The RL methods surpassed heuristic methods for specific tasks, yet all systems exhibited marked performance declines when transitioning from simulation to real-world conditions.

Implications and Future Directions

The implications of this research for practical and theoretical advancements in home robotics are profound. By standardizing OVMM as a benchmark, this work catalyzes further research on multi-task integrated systems. The paper suggests that utilizing large pretrained vision-LLMs could be crucial for enhanced OVMM task performance, combined with tailored models for specific robotics tasks.

Looking forward, expanding the complexities of tasks with more intricate language and multi-step commands, alongside deploying end-to-end learning models, is likely to be a pivotal aspect of future research. This aligns the pursuit of robotics towards more human-like interaction and assistance capabilities in real-world environments.

In conclusion, this paper contributes significantly to the discourse on robotics benchmarks and embodies a step towards more autonomous, efficient home robotics systems. The HomeRobot platform serves as a cornerstone for future explorations into open-vocabulary tasks, fostering a deeper understanding of how robots can adapt to and function within complex human environments.

PDF Markdown