Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots (2402.10329v3)

Published 15 Feb 2024 in cs.RO

Abstract: We present Universal Manipulation Interface (UMI) -- a data collection and policy learning framework that allows direct skill transfer from in-the-wild human demonstrations to deployable robot policies. UMI employs hand-held grippers coupled with careful interface design to enable portable, low-cost, and information-rich data collection for challenging bimanual and dynamic manipulation demonstrations. To facilitate deployable policy learning, UMI incorporates a carefully designed policy interface with inference-time latency matching and a relative-trajectory action representation. The resulting learned policies are hardware-agnostic and deployable across multiple robot platforms. Equipped with these features, UMI framework unlocks new robot manipulation capabilities, allowing zero-shot generalizable dynamic, bimanual, precise, and long-horizon behaviors, by only changing the training data for each task. We demonstrate UMI's versatility and efficacy with comprehensive real-world experiments, where policies learned via UMI zero-shot generalize to novel environments and objects when trained on diverse human demonstrations. UMI's hardware and software system is open-sourced at https://umi-gripper.github.io.

References (53)

Citations (86)

View on Semantic Scholar

Summary

The paper introduces UMI, a novel framework that transfers complex human manipulation skills to robots without requiring real-world robots during training.
It employs enhanced sensor setups—including fisheye lenses, IMUs, and GoPro cameras—to capture rich visuomotor data for precise, rapid movements.
Experimental results show a 70% zero-shot success rate in dynamic tasks, underscoring UMI's robust generalization across diverse environments.

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

The paper discusses the Universal Manipulation Interface (UMI), a novel framework that enables the transfer of complex human manipulation skills to robotic systems without requiring real-world robotic counterparts during training. The research addresses the main challenge of skill transfer from human demonstrations to robotic systems, which is pivotal for enhancing robot dexterity in dynamic and unstructured environments.

Summary of Findings

The UMI framework is a sophisticated yet portable, low-cost system that circumvents the limitations of traditional teleoperation and passive video-based demonstrations. It employs hand-held grippers, strategically augmented with fisheye lenses and side mirrors, for in-the-wild data collection. This setup captures a comprehensive suite of sensory data essential for learning robust visuomotor policies. The key advancements by UMI include:

Intuitive and Rich Data Collection: By using fisheye lenses and side mirrors, UMI ensures wide field-of-view observations and initiates stereo depth perception using monocular fisheye imagery. This configuration expands visual context and depth information without the complexity of multiple sensor systems.
Precision in Rapid Movements: The use of GoPro cameras coupled with IMU sensors enhances the ability to capture dynamic tasks accurately. This integration reduces latency during data capture and allows for the recovery of precise metric-scaled actions.
Latency Matching: The innovative policy interface of UMI employs synchronizing observation and execution timelines to hardware-specific latencies, ensuring that fast and dynamic manipulations remain effective during real-time robot deployment.
Hardware-Agnostic Policy Representation: The emphasis on relative trajectory and latent policy representations promotes versatility, allowing policies trained on a particular configuration to be deployed across diverse robotic systems.

Numerical Outcomes and Experimental Results

The experimental evaluations highlighted several tasks, including complex bimanual manipulations and dynamic object sorting, demonstrating UMI's capability to execute diverse action modalities not feasible with existing systems. Notably, the framework exhibited a 70% success rate in zero-shot generalization across novel environments and objects, showcasing impressive generalization capabilities seldom observed in standard behavior cloning. This outcome underscores the effectiveness of UMI in capturing and deploying skills without additional fine-tuning on target robot platforms.

Implications and Future Directions

UMI's open-sourced design promises to broaden the accessibility of robotics research, facilitating collaborative advancements in skill acquisition and cognitive embedding in robotic agents. The clear demonstration of UMI's capability to generalize across environments implies potential applications in household robotics, autonomous manipulation tasks, and unstructured outdoor settings.

The findings evoke several pathways for future research within AI and robotics:

Scalability and Ergonomic Improvements: Refining the ergonomic design of UMI grippers to better simulate more degrees-of-freedom could improve human imitation.
Enhanced Collaborative Data Collection: By fostering distributed data collection from non-expert users globally, UMI can assist in collating vast datasets essential for training comprehensive, adaptable robotic systems.
Integration with Advanced Learning Models: Incorporating UMI data with advanced machine learning models such as multi-task and continual learning could compound the adaptability and robustness of robotic systems.

In conclusion, the Universal Manipulation Interface represents a significant stride towards democratizing robotic skill acquisition, leveraging diverse human demonstrations for robust policy training without real-world robot dependencies. The broad, transferable nature of the collected data positions UMI as a key player in revolutionizing how robots learn and interact in dynamic human environments.

PDF Markdown

YouTube

Show All Videos