UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers (2407.10353v1)

Published 14 Jul 2024 in cs.RO

Abstract: We introduce UMI-on-Legs, a new framework that combines real-world and simulation data for quadruped manipulation systems. We scale task-centric data collection in the real world using a hand-held gripper (UMI), providing a cheap way to demonstrate task-relevant manipulation skills without a robot. Simultaneously, we scale robot-centric data in simulation by training whole-body controller for task-tracking without task simulation setups. The interface between these two policies is end-effector trajectories in the task frame, inferred by the manipulation policy and passed to the whole-body controller for tracking. We evaluate UMI-on-Legs on prehensile, non-prehensile, and dynamic manipulation tasks, and report over 70% success rate on all tasks. Lastly, we demonstrate the zero-shot cross-embodiment deployment of a pre-trained manipulation policy checkpoint from prior work, originally intended for a fixed-base robot arm, on our quadruped system. We believe this framework provides a scalable path towards learning expressive manipulation skills on dynamic robot embodiments. Please checkout our website for robot videos, code, and data: https://umi-on-legs.github.io

Citations (13)

View on Semantic Scholar

Summary

The paper presents a framework that integrates handheld demonstrations and simulation-based reinforcement learning to achieve robust mobile manipulation.
A novel task-frame end-effector trajectory interface decouples high-level manipulation policies from robot-specific control, enabling zero-shot cross-embodiment transfer.
Experimental results demonstrate over 70% success in diverse tasks, including dynamic actions like tossing and weight pushing on quadruped robots.

An Expert Review of "UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers"

The paper, "UMI-on-Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers," presents a comprehensive framework that bridges real-world and simulation data to enhance manipulation capabilities on legged robots. This research introduces a novel method for learning expressive manipulation skills on robots, specifically quadrupeds, by utilizing task-frame end-effector trajectories as a connecting interface between high-level diffusion policies and low-level whole-body control (WBC) systems.

The core contribution of this work lies in the successful combination of real-world demonstrations using handheld devices, such as UMI, and simulation-based reinforcement learning. This approach effectively bypasses the necessity for physical robots during data collection, significantly reducing the cost and complexity associated with robot-specific teleoperation. Further, the interface design separates the manipulation policy, enabling it to focus on task progress rather than the intricacies of robot embodiment, thus supporting robust cross-embodiment transfer of manipulation skills.

The research discusses several evaluations which exhibit the framework’s strengths. Notably, it achieves over 70% success rates across a spectrum of manipulative tasks including prehensile, non-prehensile, and dynamic actions like weight pushing and tossing. A standout experimental result is the system's ability to perform complex dynamic tasks such as tossing, where the robot leverages the whole-body dynamics effectively, indicating the controller's advanced coordination capabilities. The WBC's robustness was further demonstrated through the successful execution of tasks involving unforeseen resistances and dynamic object interactions, emphasizing the framework's adaptability and potential real-world applications.

In evaluating the framework’s scalability, the authors demonstrated zero-shot cross-embodiment deployment of existing manipulation policies, originally intended for static-base systems, on a mobile quadruped setup. This was possible due to the design choice of utilizing task-frame end-effector trajectories, thereby allowing the WBC to adapt, irrespective of robot configuration.

However, it is important to note several limitations mentioned by the authors. Not all types of manipulation actions were addressed, particularly those requiring whole-body integration which are outside the scope of the current handheld-input setup. Additionally, embodying specific constraints within high-level policies remains an open challenge in pursuit of a universally robust mobile manipulation solution.

Overall, "UMI-on-Legs" provides substantial advancements in autonomous mobile manipulation, indicating potential pathways for further development and optimization in AI-driven robotics. The results suggest promising applications across various fields requiring mobile manipulation. Future research may delve into the refinement of the integration between simulation and real-world data, exploring further the practical constraints relevant to different robot morphologies and environments. Such developments could significantly enhance the flexibility and capability of robots in dynamically complex tasks and environments, reinforcing the significance of scalable algorithmic designs in robotics. The framework thus outlines a promising trajectory for future explorations into versatile manipulation systems in robotics, marrying the strengths of diverse data sources and intelligent control architectures.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/zipengfu/status/1812893718925746557

https://twitter.com/OWW/status/1813194991394492484