Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation (2401.02117v1)

Published 4 Jan 2024 in cs.RO, cs.AI, cs.CV, cs.LG, cs.SY, and eess.SY

Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io

Definition Search Book Streamline Icon: https://streamlinehq.com
References (104)
  1. Fetch robot. https://docs.fetchrobotics.com/teleop.html.
  2. Hello robot stretch. https://github.com/hello-robot/stretch_fisheye_web_interface.
  3. Viperx 300 6dof. https://www.trossenrobotics.com/viperx-300-robot-arm.aspx.
  4. Do as i can and not as i say: Grounding language in robotic affordances. In arXiv preprint arXiv:2204.01691, 2022.
  5. Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), 2021.
  6. What happened at the darpa robotics challenge finals. The DARPA robotics challenge finals: Humanoid robots to the rescue.
  7. Hierarchical neural dynamic policies. RSS, 2021.
  8. Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022.
  9. A mobile manipulation system for one-shot teaching of complex tasks in homes. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020.
  10. Roboagent: Towards sample efficient robot manipulation with semantic augmentations and action chunking, 2023.
  11. Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
  12. Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
  13. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.
  14. Humanoid robot teleoperation with vibrotactile based balancing feedback. In Haptics: Neuroscience, Devices, Modeling, and Applications: 9th International Conference, EuroHaptics 2014, Versailles, France, June 24-26, 2014, Proceedings, Part II 9, 2014.
  15. Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation. In ASME International Mechanical Engineering Congress and Exposition, 2021.
  16. Learning generalizable robotic reward functions from" in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021.
  17. Footstep planning for the honda asimo humanoid. In ICRA, 2005.
  18. Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
  19. Team janus humanoid avatar: A cybernetic avatar to embody human telepresence. In Toward Robot Avatars: Perspectives on the ANA Avatar XPRIZE Competition, RSS Workshop, 2022.
  20. Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
  21. From play to policy: Conditional behavior generation from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022.
  22. icub3 avatar system. arXiv preprint arXiv:2203.06972, 2022.
  23. Whole-body geometric retargeting for humanoid robots. In 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019.
  24. Model-based inverse reinforcement learning from visual demonstrations. In Conference on Robot Learning, pages 1930–1942. PMLR, 2021.
  25. Transformers for one-shot visual imitation. In Conference on Robot Learning, 2020.
  26. Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2013.
  27. One-shot imitation learning. ArXiv, abs/1703.07326, 2017.
  28. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021.
  29. Perceptual values from observation. arXiv preprint arXiv:1905.07861, 2019.
  30. Learning manipulation skills from a single demonstration. The International Journal of Robotics Research, 37(1):137–154, 2018.
  31. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023a.
  32. Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023b.
  33. Optimization based full body control for the atlas robot. In International Conference on Humanoid Robots, 2014.
  34. One-shot visual imitation learning via meta-learning. In Conference on robot learning, 2017.
  35. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021.
  36. Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning, 2022.
  37. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  38. Multi-skill mobile manipulation for object rearrangement. ICLR, 2023.
  39. Robot learning in homes: Improving generalization and reducing dataset bias. Advances in neural information processing systems, 2018.
  40. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  41. Vision-based manipulators need to also see from their hands. ArXiv, abs/2203.12677, 2022. URL https://api.semanticscholar.org/CorpusID:247628166.
  42. Causal policy gradient for whole-body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023.
  43. Skill transformer: A monolithic policy for mobile manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  44. Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation, 2013.
  45. Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis. IEEE Robotics and Automation Letters, 2020.
  46. Task-embedded control networks for few-shot imitation learning. ArXiv, abs/1810.03237, 2018.
  47. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.
  48. Robot learning of mobile manipulation with reachability behavior priors. IEEE Robotics and Automation Letters, 2022.
  49. Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021a.
  50. Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. In 2021 IEEE international conference on robotics and automation (ICRA), pages 4613–4619. IEEE, 2021b.
  51. Team ihmc’s lessons learned from the darpa robotics challenge trials. Journal of Field Robotics, 2015.
  52. Force strategies for cooperative tasks in multiple mobile manipulation systems. In Robotics Research: The Seventh International Symposium, 1996.
  53. Whole body motion control framework for arbitrarily and simultaneously assigned upper-body tasks and walking motion. Modeling, Simulation and Optimization of Bipedal Walking, 2013.
  54. Robot peels banana with goal-conditioned dual-action deep imitation learning. ArXiv, abs/2203.09749, 2022.
  55. Learning motor primitives for robotics. In 2009 IEEE International Conference on Robotics and Automation, 2009.
  56. The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Challenge Finals: Humanoid Robots To The Rescue, 2018.
  57. Learning latent plans from play. In Conference on robot learning, pages 1113–1132. PMLR, 2020.
  58. Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators. IEEE Robotics and Automation Letters, 2022.
  59. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, 2021.
  60. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
  61. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
  62. Using probabilistic movement primitives in robotics. Autonomous Robots, 42:529–551, 2018.
  63. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021.
  64. Learning and generalization of motor skills by learning from demonstration. 2009 IEEE International Conference on Robotics and Automation, pages 763–768, 2009.
  65. A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot. IEEE Robotics & Automation Magazine, 2019.
  66. Learning of compliant human–robot interaction using full-body haptic interface. Advanced Robotics, 2013.
  67. Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In NIPS, 1988.
  68. Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid. arXiv preprint arXiv:2307.01350, 2023.
  69. Real-world robot learning with masked visual pre-training. CoRL, 2022.
  70. Robot learning with sensorimotor pre-training. arXiv preprint arXiv:2306.10007, 2023.
  71. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017.
  72. Humanoid dynamic synchronization through whole-body bilateral feedback teleoperation. IEEE Transactions on Robotics, 2018.
  73. U-net: Convolutional networks for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. URL https://api.semanticscholar.org/CorpusID:3719281.
  74. Latent plans for task-agnostic offline reinforcement learning. In Conference on Robot Learning, pages 1838–1849. PMLR, 2023.
  75. Nimbro avatar: Interactive immersive telepresence with force-feedback telemanipulation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5312–5319, 2021.
  76. Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023.
  77. Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022.
  78. On bringing robots home. arXiv preprint arXiv:2311.16098, 2023.
  79. Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–7233. IEEE, 2023.
  80. Concept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research, 40(12-14):1419–1434, 2021.
  81. Waypoint-based imitation learning for robotic manipulation. CoRL, 2023.
  82. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021.
  83. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022.
  84. Avid: Learning multi-stage tasks via pixel-level translation of human videos. arXiv preprint arXiv:1912.04443, 2019.
  85. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  86. Fully autonomous real-world reinforcement learning with applications to mobile manipulation. In Conference on Robot Learning, 2021.
  87. Telesar vi: Telexistence surrogate anthropomorphic robot vi. International Journal of Humanoid Robotics.
  88. Demonstrate once, imitate immediately (dome): Learning visual servoing for one-shot imitation learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
  89. Mimicplay: Long-horizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023.
  90. Error-aware imitation learning from teleoperation data for mobile manipulation. In Conference on Robot Learning, 2022.
  91. M-ember: Tackling long-horizon mobile manipulation via factorized domain transfer. ICRA, 2023a.
  92. Tidybot: Personalized robot assistance with large language models. IROS, 2023b.
  93. Towards a personal robotics development platform: Rationale and design of an intrinsically safe personal robot. In 2008 IEEE International Conference on Robotics and Automation, 2008.
  94. Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021.
  95. Decomposing the generalization gap in imitation learning for visual robotic manipulation. arXiv preprint arXiv:2307.03659, 2023.
  96. Learning by watching: Physical imitation of manipulation skills from human videos. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7827–7834. IEEE, 2021.
  97. Learning periodic tasks from human demonstrations. In 2022 International Conference on Robotics and Automation (ICRA), pages 8658–8665. IEEE, 2022.
  98. Polybot: Training one policy across robots while embracing variability. In Conference on Robot Learning, pages 2955–2974. PMLR, 2023a.
  99. Harmonic mobile manipulation. arXiv preprint arXiv:2312.06639, 2023b.
  100. Moma-force: Visual-force imitation for real-world mobile manipulation. arXiv preprint arXiv:2308.03624, 2023c.
  101. Adaptive skill coordination for robotic mobile manipulation. arXiv preprint arXiv:2304.00410, 2023.
  102. One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.
  103. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020.
  104. Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023.
Citations (165)

Summary

  • The paper introduces a cost-effective teleoperation system with a mobile base that enables sophisticated bimanual manipulation tasks.
  • It employs an imitation learning framework with co-training from static and mobile datasets, achieving up to 90% task success with as few as 50 demonstrations.
  • The results demonstrate practical whole-body control for complex tasks like cooking and cleaning, paving the way for advanced domestic and industrial robotics.

Overview of "Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation"

The paper presents a sophisticated system named Mobile ALOHA aimed at advancing the field of robotics through imitation learning and teleoperation. The objective of Mobile ALOHA is to enhance the capability of robots to perform complex, bimanual mobile manipulation tasks using a whole-body approach. The system extends the functionality of the ALOHA framework by incorporating a mobile base and assimilating a portable, cost-effective mechanism for collecting teleoperation data. This advancement allows robots to undertake tasks that require both dexterity and mobility, such as cooking and cleaning, which are not feasible with static systems.

Contributions

  1. Hardware Development:
    • Mobile ALOHA integrates a low-cost teleoperation system with a mobile base, allowing bimanual control. The system supports the simultaneous control of the base and dual arms, enabling intricate maneuvers such as opening cabinet doors or cooking, where both mobility and manual dexterity are essential.
    • The hardware setup is economically feasible, with a total cost of approximately $32,000, and is open-sourced, allowing broader accessibility and replication in various research environments.
  2. Imitation Learning Approach:
    • The paper emphasizes an imitation learning framework that leverages human demonstrations to train robots for mobile manipulation tasks. The use of co-training with existing datasets from static ALOHA substantially boosts the learning process.
    • Remarkably, co-training with static data results in up to a 90% increase in task success rates with as few as 50 demonstrations, showcasing the data efficiency of this approach.
  3. Complex Task Execution:
    • Mobile ALOHA autonomously accomplishes a variety of sophisticated tasks. These include culinary actions like sautéing and serving food, household chores like cleaning and organizing, and navigation tasks like using elevators. This highlights the practical applicability of the system in dynamic and real-world environments.

Strong Numerical Results

The paper underscores significant success rates across various tasks, notably achieving over 80% success in most tasks with just 50 demonstrations. This attests to the robustness and efficacy of the co-training strategy, which utilizes both static and mobile manipulation datasets. Additionally, it leverages advanced imitation learning methods such as ACT and Diffusion Policy, further enhancing performance through data synergy from co-training.

Implications and Future Work

Practical Implications:

  • Affordability and Accessibility: By reducing the cost of teleoperation systems, Mobile ALOHA democratizes access to sophisticated robotics research, enabling more institutions to participate in advancing mobile robot capabilities.
  • Broadening Use Cases: The versatility demonstrated suggests potential applications in domestic service robotics, aiding elderly care, hospitality, and maintenance tasks, thereby increasing the societal impact of robotics.

Theoretical Implications:

  • Imitation Learning Advances: The results contribute to the growing literature on imitation learning in robotics, particularly demonstrating the effectiveness of co-training on diverse datasets.
  • Action Coordination and Control: The research suggests further exploration into coordination mechanisms for mobile manipulators, emphasizing whole-body control as a vehicle for nuanced task execution.

Speculation on Future Developments

Future iterations of mobile manipulation systems could see enhanced real-time adaptability through reinforcement learning paradigms, eliminating reliance on predefined tasks. The integration of advanced sensor fusion and augmented reality interfaces may further improve teleoperation efficacy, aligning robot actions more closely with human intuition. Moreover, evolving this research into multi-robot systems could revolutionize collaborative tasks, expanding the horizon for industrial and commercial robotics applications.

In conclusion, "Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation" provides a substantial leap forward in both hardware and software dimensions of mobile manipulation. Its blend of cost-effective design and robust learning methodology makes it a notable contribution to the field of robotics, offering a blueprint for developing more adaptable, skillful robotic systems.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com