Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical World Models as Visual Whole-Body Humanoid Controllers (2405.18418v2)

Published 28 May 2024 in cs.LG, cs.CV, and cs.RO

Abstract: Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://nicklashansen.com/rlpuppeteer

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. BostonDynamics. Atlas, 2024. URL www.bostondynamics.com/atlas.
  2. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  3. Myosuite – a contact-rich simulation suite for musculoskeletal motor control, 2022.
  4. Simple hierarchical planning with diffusion. arXiv preprint arXiv:2401.02644, 2024.
  5. Expressive whole-body control for humanoid robots. arXiv preprint arXiv:2402.16796, 2024.
  6. Faulty reward functions in the wild. OpenAI Blog, 2016.
  7. Carnegie Mellon University CMU. Carnegie mellon university graphics lab motion capture database, 2003. URL http://mocap.cs.cmu.edu.
  8. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. ArXiv, abs/1812.00568, 2018.
  9. Finetuning offline world models in the real world. Conference on Robot Learning, 2023.
  10. Bootstrap your own latent: A new approach to self-supervised learning. Advances in Neural Information Processing Systems, 2020.
  11. Mabel, a new robotic bipedal walker and runner. In 2009 American Control Conference, pages 2030–2036, 2009. doi: 10.1109/ACC.2009.5160550.
  12. Learning hierarchical world models with adaptive temporal abstractions from discrete latent dynamics. In The Twelfth International Conference on Learning Representations, 2023.
  13. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems 31, pages 2451–2463. Curran Associates, Inc., 2018.
  14. Soft actor-critic algorithms and applications. ArXiv, abs/1812.05905, 2018.
  15. Dream to control: Learning behaviors by latent imagination. ArXiv, abs/1912.01603, 2020.
  16. Mastering atari with discrete world models. International Conference on Learning Representations, 2021.
  17. Deep hierarchical planning from pixels. Advances in Neural Information Processing Systems, 35:26091–26104, 2022.
  18. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  19. Temporal difference learning for model predictive control. In ICML, 2022.
  20. Modem: Accelerating visual model-based reinforcement learning with demonstrations. 2023a.
  21. On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. In International Conference on Machine Learning (ICML), 2023b.
  22. Td-mpc2: Scalable, robust world models for continuous control, 2024.
  23. Comic: Complementary task learning & mimicry for reusable skills. In International Conference on Machine Learning, pages 4105–4115. PMLR, 2020.
  24. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
  25. H-gap: Humanoid control with a generalist planner. arXiv preprint arXiv:2312.02682, 2023.
  26. Model-based reinforcement learning for atari. ArXiv, abs/1903.00374, 2020.
  27. Modem-v2: Visuo-motor world models for real-world robot manipulation. arXiv preprint, 2023.
  28. Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
  29. Multi-game decision transformers. ArXiv, abs/2205.15241, 2022.
  30. Robust and versatile bipedal jumping control through reinforcement learning. In Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu, editors, Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023. doi: 10.15607/RSS.2023.XIX.052. URL https://doi.org/10.15607/RSS.2023.XIX.052.
  31. Spawnnet: Learning generalizable visuomotor skills from pre-trained networks, 2023.
  32. Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201, 2017.
  33. Hierarchical visuomotor control of humanoids. arXiv preprint arXiv:1811.09656, 2018a.
  34. Neural probabilistic motor primitives for humanoid control. arXiv preprint arXiv:1811.11711, 2018b.
  35. Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:1909.10618, 2019.
  36. Learning and generalization of motor skills by learning from demonstration. In 2009 IEEE International Conference on Robotics and Automation, pages 763–768, 2009. doi: 10.1109/ROBOT.2009.5152385.
  37. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4):143:1–143:14, July 2018. ISSN 0730-0301. doi: 10.1145/3197517.3201311. URL http://doi.acm.org/10.1145/3197517.3201311.
  38. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  39. Humanoidbench: Simulated humanoid benchmark for whole-body locomotion and manipulation, 2024.
  40. Skill-based model-based reinforcement learning. 2022.
  41. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460–9471, 2022.
  42. Joint embedding predictive architectures focus on slow features. arXiv preprint arXiv:2211.10831, 2022.
  43. R. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1998.
  44. Gradient-based planning with world models. arXiv preprint arXiv:2312.17227, 2023.
  45. Deepmind control suite. Technical report, DeepMind, 2018.
  46. Unitree. H1, 2024. URL www.unitree.com/h1.
  47. MoCapAct: A multi-task dataset for simulated humanoid control. In Advances in Neural Information Processing Systems, volume 35, pages 35418–35431, 2022.
  48. Model predictive path integral control using covariance variable importance sampling. ArXiv, abs/1509.01149, 2015.
  49. On the feasibility of cross-task transfer with model-based reinforcement learning. 2023.
  50. Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
  51. Solar: Deep structured latent representations for model-based reinforcement learning. ArXiv, abs/1808.09105, 2018.
  52. Online decision transformer. In ICML, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Nicklas Hansen (22 papers)
  2. Jyothir S V (6 papers)
  3. Vlad Sobal (8 papers)
  4. Yann LeCun (173 papers)
  5. Xiaolong Wang (243 papers)
  6. Hao Su (218 papers)
Citations (6)

Summary

  • The paper presents Puppeteer, a hierarchical world model employing dual-level model-based RL to generate natural, human-like motions in high-dimensional humanoid control.
  • It leverages a low-level tracking agent pretrained on MoCap data and a high-level visual puppeteering agent to efficiently coordinate joint-level and task-specific actions.
  • Experimental results show over 95% user preference for its natural motions and robust performance across 8 diverse whole-body control tasks.

Puppeteer: A Hierarchical World Model for Visual Whole-Body Humanoid Control

Overview

The paper presents Puppeteer, a hierarchical world model aimed at tackling the high-dimensional problem of whole-body control for humanoids, using visual observations. The framework leverages data-driven reinforcement learning (RL) approaches to generate natural, human-like motion without relying on manual reward engineering or pre-defined skill primitives. Notably, Puppeteer consists of two hierarchically organized agents: a low-level proprioceptive agent and a high-level visual puppeteering agent. Both agents are trained through model-based RL, enabling the system to accomplish a diverse set of tasks with a simulated 56-DoF humanoid.

Methodology

Hierarchical World Model

Puppeteer's core architecture is a hierarchical world model wherein:

  • Low-Level Tracking Agent: Trained on human MoCap data to track reference motions. This agent receives proprioceptive state qt\mathbf{q}_t and a command ct\mathbf{c}_t as inputs and synthesizes a sequence of actions that follow these commands.
  • High-Level Puppeteering Agent: Uses visual observations to generate commands for the low-level agent based on the downstream task's requirements. This agent processes both the proprioceptive state qt\mathbf{q}_t and visual input vt\mathbf{v}_t to produce reference commands.

The agents operate on different levels of abstraction, with the low-level agent focusing on joint-level physics and the high-level agent on end-effector positions, making the entire system computationally efficient and generalizable across tasks.

Key Features

  • Model-Based RL: The method utilizes TD-MPC2 for both agents, enabling efficient planning and policy optimization through a learned world model without decoding raw observations.
  • Two-Stage Training: The low-level agent is pretrained on MoCap data and can track various human motions when re-targeted to the humanoid embodiment. The high-level agent is subsequently trained on specific downstream tasks, using the pretrained low-level agent.
  • Termination Handling: Incorporates a termination prediction head to handle episode termination conditions, particularly important for stability in high-dimensional control tasks.

Experimental Evaluation

Task Suite

An 8-task suite was curated to evaluate Puppeteer, comprising a mix of visual and non-visual whole-body humanoid control tasks. The tasks ranged from straightforward locomotion like walking and running to more complex activities such as jumping over hurdles and navigating stairs.

Performance and Naturalness

Puppeteer demonstrated highly performant control policies, competitive with state-of-the-art methods like TD-MPC2. However, it significantly outperformed others in terms of producing natural, human-like motions. This was quantitatively supported by user studies, wherein over 95% of participants preferred motions generated by Puppeteer over those from TD-MPC2. Further, the method also measured favorably on metrics like average episode length and mean torso height, reflecting more realistic humanoid behavior.

Ablation Studies

The paper provides a thorough ablation paper highlighting the importance of:

  • Mixed offline and online data during the low-level agent's pretraining for enhanced robustness.
  • Planning over model-free policies, showing that planning at both hierarchical levels was critical for high-dimensional control.
  • Zero-shot generalization, where Puppeteer successfully handled larger gap lengths in the gaps task than it encountered during training.

Implications and Future Work

Practical Implications

This research offers significant practical potential for humanoid robotics, particularly where human-like motion and real-time decision-making are critical, such as in service robotics, search and rescue operations, and entertainment industries.

Theoretical Implications

From a theoretical perspective, Puppeteer advances the understanding of hierarchical RL and model-based planning in high-dimensional spaces. It demonstrates the efficacy of combining data-driven approaches with hierarchical planning, thus paving the way for more generalized and adaptable robotic systems.

Future Developments

Future research could explore extending this hierarchical framework to more complex, real-world scenarios, incorporating richer sensory inputs and more dynamic tasks. Further investigation into the generalization capabilities will also be crucial to improving robustness across varying environmental conditions and task requirements.

Conclusion

Puppeteer represents a significant step forward in the field of humanoid control, offering a robust, data-driven approach to achieving natural, human-like motion through a hierarchical world model. Its ability to handle a wide range of tasks with minimal assumption marks it as a versatile framework that holds promise for both academic research and practical applications in robotics.