Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation (2401.02117v1)
Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io
- Fetch robot. https://docs.fetchrobotics.com/teleop.html.
- Hello robot stretch. https://github.com/hello-robot/stretch_fisheye_web_interface.
- Viperx 300 6dof. https://www.trossenrobotics.com/viperx-300-robot-arm.aspx.
- Do as i can and not as i say: Grounding language in robotic affordances. In arXiv preprint arXiv:2204.01691, 2022.
- Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), 2021.
- What happened at the darpa robotics challenge finals. The DARPA robotics challenge finals: Humanoid robots to the rescue.
- Hierarchical neural dynamic policies. RSS, 2021.
- Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022.
- A mobile manipulation system for one-shot teaching of complex tasks in homes. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020.
- Roboagent: Towards sample efficient robot manipulation with semantic augmentations and action chunking, 2023.
- Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
- Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.
- Humanoid robot teleoperation with vibrotactile based balancing feedback. In Haptics: Neuroscience, Devices, Modeling, and Applications: 9th International Conference, EuroHaptics 2014, Versailles, France, June 24-26, 2014, Proceedings, Part II 9, 2014.
- Humanoid loco-manipulation of pushed carts utilizing virtual reality teleoperation. In ASME International Mechanical Engineering Congress and Exposition, 2021.
- Learning generalizable robotic reward functions from" in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021.
- Footstep planning for the honda asimo humanoid. In ICRA, 2005.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Team janus humanoid avatar: A cybernetic avatar to embody human telepresence. In Toward Robot Avatars: Perspectives on the ANA Avatar XPRIZE Competition, RSS Workshop, 2022.
- Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
- From play to policy: Conditional behavior generation from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022.
- icub3 avatar system. arXiv preprint arXiv:2203.06972, 2022.
- Whole-body geometric retargeting for humanoid robots. In 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019.
- Model-based inverse reinforcement learning from visual demonstrations. In Conference on Robot Learning, pages 1930–1942. PMLR, 2021.
- Transformers for one-shot visual imitation. In Conference on Robot Learning, 2020.
- Legibility and predictability of robot motion. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2013.
- One-shot imitation learning. ArXiv, abs/1703.07326, 2017.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021.
- Perceptual values from observation. arXiv preprint arXiv:1905.07861, 2019.
- Learning manipulation skills from a single demonstration. The International Journal of Robotics Research, 37(1):137–154, 2018.
- Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023a.
- Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023b.
- Optimization based full body control for the atlas robot. In International Conference on Humanoid Robots, 2014.
- One-shot visual imitation learning via meta-learning. In Conference on robot learning, 2017.
- Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021.
- Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning, 2022.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Multi-skill mobile manipulation for object rearrangement. ICLR, 2023.
- Robot learning in homes: Improving generalization and reducing dataset bias. Advances in neural information processing systems, 2018.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Vision-based manipulators need to also see from their hands. ArXiv, abs/2203.12677, 2022. URL https://api.semanticscholar.org/CorpusID:247628166.
- Causal policy gradient for whole-body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023.
- Skill transformer: A monolithic policy for mobile manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation, 2013.
- Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis. IEEE Robotics and Automation Letters, 2020.
- Task-embedded control networks for few-shot imitation learning. ArXiv, abs/1810.03237, 2018.
- Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.
- Robot learning of mobile manipulation with reachability behavior priors. IEEE Robotics and Automation Letters, 2022.
- Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021a.
- Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. In 2021 IEEE international conference on robotics and automation (ICRA), pages 4613–4619. IEEE, 2021b.
- Team ihmc’s lessons learned from the darpa robotics challenge trials. Journal of Field Robotics, 2015.
- Force strategies for cooperative tasks in multiple mobile manipulation systems. In Robotics Research: The Seventh International Symposium, 1996.
- Whole body motion control framework for arbitrarily and simultaneously assigned upper-body tasks and walking motion. Modeling, Simulation and Optimization of Bipedal Walking, 2013.
- Robot peels banana with goal-conditioned dual-action deep imitation learning. ArXiv, abs/2203.09749, 2022.
- Learning motor primitives for robotics. In 2009 IEEE International Conference on Robotics and Automation, 2009.
- The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Challenge Finals: Humanoid Robots To The Rescue, 2018.
- Learning latent plans from play. In Conference on robot learning, pages 1113–1132. PMLR, 2020.
- Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators. IEEE Robotics and Automation Letters, 2022.
- What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, 2021.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
- Using probabilistic movement primitives in robotics. Autonomous Robots, 42:529–551, 2018.
- The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021.
- Learning and generalization of motor skills by learning from demonstration. 2009 IEEE International Conference on Robotics and Automation, pages 763–768, 2009.
- A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot. IEEE Robotics & Automation Magazine, 2019.
- Learning of compliant human–robot interaction using full-body haptic interface. Advanced Robotics, 2013.
- Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In NIPS, 1988.
- Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid. arXiv preprint arXiv:2307.01350, 2023.
- Real-world robot learning with masked visual pre-training. CoRL, 2022.
- Robot learning with sensorimotor pre-training. arXiv preprint arXiv:2306.10007, 2023.
- Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017.
- Humanoid dynamic synchronization through whole-body bilateral feedback teleoperation. IEEE Transactions on Robotics, 2018.
- U-net: Convolutional networks for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. URL https://api.semanticscholar.org/CorpusID:3719281.
- Latent plans for task-agnostic offline reinforcement learning. In Conference on Robot Learning, pages 1838–1849. PMLR, 2023.
- Nimbro avatar: Interactive immersive telepresence with force-feedback telemanipulation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5312–5319, 2021.
- Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023.
- Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022.
- On bringing robots home. arXiv preprint arXiv:2311.16098, 2023.
- Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–7233. IEEE, 2023.
- Concept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research, 40(12-14):1419–1434, 2021.
- Waypoint-based imitation learning for robotic manipulation. CoRL, 2023.
- Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021.
- Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022.
- Avid: Learning multi-stage tasks via pixel-level translation of human videos. arXiv preprint arXiv:1912.04443, 2019.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Fully autonomous real-world reinforcement learning with applications to mobile manipulation. In Conference on Robot Learning, 2021.
- Telesar vi: Telexistence surrogate anthropomorphic robot vi. International Journal of Humanoid Robotics.
- Demonstrate once, imitate immediately (dome): Learning visual servoing for one-shot imitation learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
- Mimicplay: Long-horizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023.
- Error-aware imitation learning from teleoperation data for mobile manipulation. In Conference on Robot Learning, 2022.
- M-ember: Tackling long-horizon mobile manipulation via factorized domain transfer. ICRA, 2023a.
- Tidybot: Personalized robot assistance with large language models. IROS, 2023b.
- Towards a personal robotics development platform: Rationale and design of an intrinsically safe personal robot. In 2008 IEEE International Conference on Robotics and Automation, 2008.
- Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021.
- Decomposing the generalization gap in imitation learning for visual robotic manipulation. arXiv preprint arXiv:2307.03659, 2023.
- Learning by watching: Physical imitation of manipulation skills from human videos. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7827–7834. IEEE, 2021.
- Learning periodic tasks from human demonstrations. In 2022 International Conference on Robotics and Automation (ICRA), pages 8658–8665. IEEE, 2022.
- Polybot: Training one policy across robots while embracing variability. In Conference on Robot Learning, pages 2955–2974. PMLR, 2023a.
- Harmonic mobile manipulation. arXiv preprint arXiv:2312.06639, 2023b.
- Moma-force: Visual-force imitation for real-world mobile manipulation. arXiv preprint arXiv:2308.03624, 2023c.
- Adaptive skill coordination for robotic mobile manipulation. arXiv preprint arXiv:2304.00410, 2023.
- One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.
- Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020.
- Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023.