Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots (2402.10329v3)
Abstract: We present Universal Manipulation Interface (UMI) -- a data collection and policy learning framework that allows direct skill transfer from in-the-wild human demonstrations to deployable robot policies. UMI employs hand-held grippers coupled with careful interface design to enable portable, low-cost, and information-rich data collection for challenging bimanual and dynamic manipulation demonstrations. To facilitate deployable policy learning, UMI incorporates a carefully designed policy interface with inference-time latency matching and a relative-trajectory action representation. The resulting learned policies are hardware-agnostic and deployable across multiple robot platforms. Equipped with these features, UMI framework unlocks new robot manipulation capabilities, allowing zero-shot generalizable dynamic, bimanual, precise, and long-horizon behaviors, by only changing the training data for each task. We demonstrate UMI's versatility and efficacy with comprehensive real-world experiments, where policies learned via UMI zero-shot generalize to novel environments and objects when trained on diverse human demonstrations. UMI's hardware and software system is open-sourced at https://umi-gripper.github.io.
- Human-to-robot imitation in the wild. In Proceedings of Robotics: Science and Systems (RSS), 2022.
- Affordances from human videos as a versatile representation for robotics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13778–13790, 2023.
- Rt-1: Robotics transformer for real-world control at scale. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Humanoid robot teleoperation with vibrotactile based balancing feedback. In Haptics: Neuroscience, Devices, Modeling, and Applications: 9th International Conference, EuroHaptics 2014, Versailles, France, June 24-26, 2014, Proceedings, Part II 9, pages 266–275. Springer, 2014.
- The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 International Conference on Advanced Robotics (ICAR), pages 510–517, 2015. doi: 10.1109/ICAR.2015.7251504.
- Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021a.
- Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021b. doi: 10.1109/TRO.2021.3075644.
- Learning generalizable robotic reward functions from “in-the-wild” human videos. In Proceedings of Robotics: Science and Systems (RSS), 2021.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- On hand-held grippers and the morphological gap in human manipulation demonstration. arXiv preprint arXiv:2311.01832, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. In Proceedings of Robotics: Science and Systems (RSS), 2022.
- Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023.
- Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. arXiv preprint arXiv:2401.02117, 2024.
- Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6):2280–2292, 2014. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2014.01.005. URL https://www.sciencedirect.com/science/article/pii/S0031320314000235.
- Deep residual learning for image recognition. corr abs/1512.03385 (2015), 2015.
- GoPro Inc. Gpmf introuction: Parser for gpmf™ formatted telemetry data used within gopro® cameras. https://gopro.github.io/gpmf-parser/. Accesssed: 2023-01-31.
- Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning (CoRL), volume 164, pages 991–1002. PMLR, 2022.
- Giving robots a hand: Broadening generalization via hand-centric human video demonstrations. In Deep Reinforcement Learning Workshop NeurIPS, 2022.
- VIP: Towards universal visual reward and representation via value-implicit pre-training. In The Eleventh International Conference on Learning Representations, 2023.
- Roboturk: A crowdsourcing platform for robotic skill learning through imitation. In Conference on Robot Learning (CoRL), volume 87, pages 879–893. PMLR, 2018.
- R3m: A universal visual representation for robot manipulation. In Proceedings of The 6th Conference on Robot Learning (CoRL), volume 205, pages 892–909. PMLR, 2022.
- Tax-pose: Task-specific cross-pose estimation for robot manipulation. In Proceedings of The 6th Conference on Robot Learning (CoRL), volume 205, pages 1783–1792. PMLR, 2023.
- The surprising effectiveness of representation learning for visual imitation. In Proceedings of Robotics: Science and Systems (RSS), 2022.
- Learning of compliant human–robot interaction using full-body haptic interface. Advanced Robotics, 27(13):1003–1012, 2013.
- Characterizing input methods for human-to-robot demonstrations. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 344–353. IEEE, 2019.
- Dexmv: Imitation learning for dexterous manipulation from human videos. In European Conference on Computer Vision, pages 570–587. Springer, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330, 2020.
- Latent plans for task-agnostic offline reinforcement learning. In Proceedings of The 6th Conference on Robot Learning (CoRL), volume 205, pages 1838–1849. PMLR, 2023.
- Scalable. intuitive human to robot skill transfer with wearable human machine interfaces: On complex, dexterous tasks. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6318–6325. IEEE, 2023.
- Learning predictive models from observation and interaction. In European Conference on Computer Vision, pages 708–725. Springer, 2020.
- Reinforcement learning with videos: Combining offline observations with interaction. In Proceedings of the 2020 Conference on Robot Learning (CoRL), volume 155, pages 339–354. PMLR, 2021.
- Deep imitation learning for humanoid loco-manipulation through human teleoperation. In 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2023.
- On bringing robots home. arXiv preprint arXiv:2311.16098, 2023.
- Concept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research, 40(12-14):1419–1434, 2021.
- Videodex: Learning dexterity from internet videos. In Proceedings of The 6th Conference on Robot Learning (CoRL), volume 205, pages 654–665. PMLR, 2023.
- Distilled feature fields enable few-shot language-guided manipulation. In Proceedings of The 7th Conference on Robot Learning (CoRL), volume 229, pages 405–424. PMLR, 2023.
- Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 6394–6400. IEEE, 2022.
- Grasping in the wild: Learning 6dof closed-loop grasping from low-cost demonstrations. Robotics and Automation Letters, 2020.
- SEED: Series elastic end effectors in 6d for visuotactile tool use. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4684–4691, 2022. doi: 10.1109/IROS47612.2022.9982092.
- A force-sensitive exoskeleton for teleoperation: An application in elderly care robotics. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 12624–12630. IEEE, 2023.
- Mimicplay: Long-horizon imitation learning by watching human play. In Proceedings of The 7th Conference on Robot Learning (CoRL), volume 229, pages 201–221. PMLR, 2023.
- Error-aware imitation learning from teleoperation data for mobile manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), volume 164, pages 1367–1378. PMLR, 2022.
- GELLO: A general, low-cost, and intuitive teleoperation framework for robot manipulators. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition @ CoRL2023, 2023.
- Towards a personal robotics development platform: Rationale and design of an intrinsically safe personal robot. In 2008 IEEE International Conference on Robotics and Automation, pages 2165–2170. IEEE, 2008.
- Masked visual pre-training for motor control. arXiv:2203.06173, 2022.
- Learning by watching: Physical imitation of manipulation skills from human videos. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7827–7834. IEEE, 2021.
- Visual imitation made easy. In Conference on Robot Learning (CoRL), volume 155, pages 1992–2005. PMLR, 2021.
- Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5628–5635. IEEE, 2018.
- Benefit of large field-of-view cameras for visual odometry. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 801–808, 2016. doi: 10.1109/ICRA.2016.7487210.
- Learning fine-grained bimanual manipulation with low-cost hardware. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Viola: Imitation learning for vision-based manipulation with object proposal priors. In Proceedings of The 6th Conference on Robot Learning (CoRL), volume 205, pages 1199–1210. PMLR, 2023.