Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation (2403.07869v2)

Published 12 Mar 2024 in cs.RO, cs.AI, and cs.LG

Abstract: A critical bottleneck limiting imitation learning in robotics is the lack of data. This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces. In this work, we demonstrate TeleMoMa, a general and modular interface for whole-body teleoperation of mobile manipulators. TeleMoMa unifies multiple human interfaces including RGB and depth cameras, virtual reality controllers, keyboard, joysticks, etc., and any combination thereof. In its more accessible version, TeleMoMa works using simply vision (e.g., an RGB-D camera), lowering the entry bar for humans to provide mobile manipulation demonstrations. We demonstrate the versatility of TeleMoMa by teleoperating several existing mobile manipulators - PAL Tiago++, Toyota HSR, and Fetch - in simulation and the real world. We demonstrate the quality of the demonstrations collected with TeleMoMa by training imitation learning policies for mobile manipulation tasks involving synchronized whole-body motion. Finally, we also show that TeleMoMa's teleoperation channel enables teleoperation on site, looking at the robot, or remote, sending commands and observations through a computer network, and perform user studies to evaluate how easy it is for novice users to learn to collect demonstrations with different combinations of human interfaces enabled by our system. We hope TeleMoMa becomes a helpful tool for the community enabling researchers to collect whole-body mobile manipulation demonstrations. For more information and video results, https://robin-lab.cs.utexas.edu/telemoma-web.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. A unified approach for motion and force control of robot manipulators: The operational space formulation. IEEE Journal on Robotics and Automation, 3(1):43–53, 1987.
  2. Whole-body dynamic behavior and control of human-like robots. International Journal of Humanoid Robotics, 1(01):29–43, 2004.
  3. Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), pages 299–305. IEEE, 2021.
  4. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  6. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023a.
  7. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pages 287–318. PMLR, 2023b.
  8. Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In Conference on Robot Learning, pages 3909–3928. PMLR, 2023.
  9. Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
  10. Pato: Policy assisted teleoperation for scalable robot data collection. arXiv preprint arXiv:2212.04708, 2022.
  11. Helping robots learn: a human-robot master-apprentice model using demonstrations via virtual reality teleoperation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 10226–10233. IEEE, 2020.
  12. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  13. Low-cost exoskeletons for learning whole-arm manipulation in the wild. arXiv preprint arXiv:2309.14975, 2023.
  14. Foundation models in robotics: Applications, challenges, and the future. arXiv preprint arXiv:2312.07843, 2023.
  15. First-person tele-operation of a humanoid robot. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 997–1002. IEEE, 2015.
  16. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. arXiv preprint arXiv:2401.02117, 2024.
  17. Virtual reality teleoperation system for mobile robot manipulation. Robotics, 12(6):163, 2023.
  18. Threedworld: A platform for interactive multi-modal physical simulation. Advances in Neural Information Processing Systems (NeurIPS), 2021.
  19. Vrkitchen: an interactive 3d virtual environment for task-oriented learning. arXiv preprint arXiv:1903.05757, 2019.
  20. The robotrix: An extremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6790–6797. IEEE, 2018.
  21. Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9164–9170. IEEE, 2020.
  22. Fleet-dagger: Interactive robot fleet learning with scalable human supervision. In Conference on Robot Learning, pages 368–380. PMLR, 2023.
  23. Learning motion parameterizations of mobile pick and place actions from observing humans in virtual environments. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9736–9743. IEEE, 2020.
  24. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, September 2013. ISSN 0278-3649, 1741-3176. doi: 10.1177/0278364913495721. URL http://journals.sagepub.com/doi/10.1177/0278364913495721.
  25. The kit bimanual manipulation dataset. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), pages 499–506. IEEE, 2021.
  26. Bimanual telemanipulation with force and haptic feedback through an anthropomorphic avatar system. Robotics and Autonomous Systems, 161:104338, 2023.
  27. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning, pages 80–93. PMLR, 2023.
  28. Baxter’s homunculus: Virtual reality spaces for teleoperation in manufacturing. IEEE Robotics and Automation Letters, 3(1):179–186, 2017.
  29. Mediapipe: A framework for perceiving and processing reality. In Third workshop on computer vision for AR/VR at IEEE computer vision and pattern recognition (CVPR), volume 2019, 2019.
  30. Roboturk: A crowdsourcing platform for robotic skill learning through imitation. In Conference on Robot Learning, pages 879–893. PMLR, 2018.
  31. Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1048–1055. IEEE, 2019.
  32. What matters in learning from offline human demonstrations for robot manipulation. In arXiv preprint arXiv:2108.03298, 2021.
  33. A versatile generalized inverted kinematics implementation for collaborative working humanoid robots: The stack of tasks. In 2009 International conference on advanced robotics, pages 1–6. IEEE, 2009.
  34. Unrealrox: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virtual Reality, 24:271–288, 2020.
  35. Development of a whole-body work imitation learning system by a biped and bi-armed humanoid. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10374–10381. IEEE, 2023.
  36. Tiago: the modular robot that adapts to different research needs. In International workshop on robot modularity, IROS, volume 290, 2016.
  37. Mixed reality teleoperation assistance for direct control of humanoids. IEEE Robotics and Automation Letters, 2024.
  38. Dynamic mobile manipulation via whole-body bilateral teleoperation of a wheeled humanoid. IEEE Robotics and Automation Letters, 2023.
  39. Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system. arXiv preprint arXiv:2307.04577, 2023.
  40. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. In 2018 IEEE international conference on robotics and automation (ICRA), pages 3758–3765. IEEE, 2018.
  41. A web-based infrastructure for recording user demonstrations of mobile manipulation tasks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 5523–5530. IEEE, 2015.
  42. Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330, 2020.
  43. Development of a six dof haptic master for teleoperation of a mobile manipulator. Mechatronics, 20(2):181–191, 2010.
  44. Stefan Schaal. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, 3(6):233–242, 1999.
  45. Robust immersive telepresence and mobile telemanipulation: Nimbro wins ana avatar xprize finals. arXiv preprint arXiv:2303.03297, 2023.
  46. Deep imitation learning for humanoid loco-manipulation through human teleoperation. In 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2023.
  47. Marionet: Motion acquisition for robots through iterative online evaluative training. In Ninth International Conference on Autonomous Agents and Multiagent Systems - Agents Learning Interactively from Human Teachers Workshop (AAMAS - ALIHT), May 2010.
  48. Springer handbook of robotics, volume 200. Springer, 2008.
  49. Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. arXiv preprint arXiv:2202.10448, 2022.
  50. Teleoperation of a humanoid robot using full-body motion capture, example movements, and machine learning. In Proc. Australasian Conference on Robotics and Automation, volume 8, page 51, 2012.
  51. Octo: An open-source generalist robot policy, 2023.
  52. Learning multi-arm manipulation through collaborative teleoperation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 9212–9219. IEEE, 2021.
  53. Teleoperations and robotics: evolution and development. Prentice-Hall, Inc., 1986.
  54. Bridgedata v2: A dataset for robot learning at scale. In Conference on Robot Learning (CoRL), 2023.
  55. Development of human-machine interface for teleoperation of a mobile manipulator. International Journal of Control, Automation and Systems, 10:1225–1231, 2012.
  56. Ros reality: A virtual reality framework using consumer-grade hardware for ros-enabled robots. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–9. IEEE, 2018.
  57. Fetch and freight: Standard platforms for service robot applications. In Workshop on autonomous mobile service robots, pages 1–6, 2016.
  58. Error-aware imitation learning from teleoperation data for mobile manipulation. In Conference on Robot Learning, pages 1367–1378. PMLR, 2022.
  59. Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators. arXiv preprint arXiv:2309.13037, 2023.
  60. Development of human support robot as the research platform of a domestic mobile manipulator. ROBOMECH journal, 6(1):1–15, 2019.
  61. Moma-force: Visual-force imitation for real-world mobile manipulation. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6847–6852. IEEE, 2023.
  62. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 5628–5635. IEEE, 2018a.
  63. Real-time whole-body imitation by humanoid robots and task-oriented teleoperation using an analytical mapping method and quantitative evaluation. Applied Sciences, 8(10):2005, 2018b.
  64. Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
Citations (8)

Summary

  • The paper introduces TeleMoMa as a modular and versatile teleoperation framework that enables whole-body control for mobile manipulators using multiple input devices.
  • Its tripartite architecture, comprising a Human Interface, Teleoperation Channel, and Robot Interface, supports flexible integration across varied teleoperation scenarios.
  • User studies validate TeleMoMa’s efficacy in data collection for imitation learning, paving the way for practical deployments in household and industrial robotics.

Unveiling TeleMoMa: A Pathway to Enhanced Imitation Learning through Advanced Teleoperation

Introduction to Teleoperation in Mobile Manipulation

The domain of mobile manipulation, a cornerstone of robotics, aims at expanding the functionality of robots, allowing them to perform a plethora of tasks alongside humans in diverse environments. A pivotal approach in advancing these robots involves learning from human demonstrations, a method that significantly benefits from the application of large-scale datasets. However, a persistent challenge in mobile manipulation is the acquisition of these demonstrations, primarily due to the absence of intuitive and versatile teleoperation systems. Contrary to stationary manipulation, where datasets are plentiful owing to the accessibility of teleoperation frameworks, mobile manipulation tasks demand a sophisticated level of interaction, incorporating mobility and manipulation, often in a bimanual operation context.

The Emergence of TeleMoMa

In this landscape, the introduction of TeleMoMa stands out as a significant contribution. TeleMoMa, short for Teleoperation for Mobile Manipulation, presents itself as a general, modular interface designed to facilitate whole-body teleoperation for mobile manipulators. It seamlessly integrates various human input interfaces, ranging from RGB and depth cameras to virtual reality (VR) controllers, keyboards, and joysticks, thereby offering unprecedented flexibility and accessibility in teleoperation. Notably, TeleMoMa shines in its capacity to operate with just a vision input, such as an RGB-D camera, dramatically lowering the entry barriers for individuals aiming to provide mobile manipulation demonstrations.

Technical Details and Innovations

TeleMoMa's architecture is a tripartite system comprising a Human Interface, a Teleoperation Channel, and a Robot Interface. Its prowess lies in its modular design, allowing for the combination of multiple input devices to suit various teleoperation needs. This design not only makes TeleMoMa adaptable for a wide range of robot platforms but also caters to different teleoperation scenarios, from on-site operating by directly observing the robot to remote operations facilitated through computer networks. In demonstration scenarios involving PAL Tiago++, Toyota HSR, and Fetch robots, TeleMoMa has showcased its versatility and efficacy, thereby underscoring its potential as a vital tool in mobile manipulation research.

Evaluation and Implications

The evaluation of TeleMoMa involves user studies and the training of imitation learning policies for tasks requiring synchronized whole-body motion. These studies validate TeleMoMa's usability and its competency in data collection for imitation learning, subsequently enabling the training of effective policies for complex mobile manipulation tasks. Such advancements are not merely academic; they carry substantial implications for real-world applications, from automating household chores to executing tasks in industrial settings, thereby expanding the horizons of robotic capabilities in human environments.

The Road Ahead

While TeleMoMa heralds a new era in mobile manipulation research by addressing the critical bottleneck of data acquisition, it also opens avenues for future developments. The system's design encourages continuous additions, such as incorporating more input devices for an even broader range of teleoperation scenarios. Moreover, the pursuit of improving teleoperation accuracy, especially in vision-based interfaces, remains a fertile ground for research, promising enhancements in robot learning from human demonstrations.

Conclusion

In conclusion, TeleMoMa emerges as a groundbreaking teleoperation framework, setting a new standard in versatility and modularity for mobile manipulation tasks. By bridging the gap in data collection for imitation learning, it propels the field towards realizing more adept and versatile robots, capable of operating in tandem with humans across myriad settings. As TeleMoMa continues to evolve, it holds the promise of unlocking new possibilities in robotics, making the future of human-robot collaboration more dynamic and productive than ever before.

X Twitter Logo Streamline Icon: https://streamlinehq.com