Hierarchical World Models as Visual Whole-Body Humanoid Controllers (2405.18418v2)
Abstract: Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://nicklashansen.com/rlpuppeteer
- BostonDynamics. Atlas, 2024. URL www.bostondynamics.com/atlas.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
- Myosuite – a contact-rich simulation suite for musculoskeletal motor control, 2022.
- Simple hierarchical planning with diffusion. arXiv preprint arXiv:2401.02644, 2024.
- Expressive whole-body control for humanoid robots. arXiv preprint arXiv:2402.16796, 2024.
- Faulty reward functions in the wild. OpenAI Blog, 2016.
- Carnegie Mellon University CMU. Carnegie mellon university graphics lab motion capture database, 2003. URL http://mocap.cs.cmu.edu.
- Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. ArXiv, abs/1812.00568, 2018.
- Finetuning offline world models in the real world. Conference on Robot Learning, 2023.
- Bootstrap your own latent: A new approach to self-supervised learning. Advances in Neural Information Processing Systems, 2020.
- Mabel, a new robotic bipedal walker and runner. In 2009 American Control Conference, pages 2030–2036, 2009. doi: 10.1109/ACC.2009.5160550.
- Learning hierarchical world models with adaptive temporal abstractions from discrete latent dynamics. In The Twelfth International Conference on Learning Representations, 2023.
- Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems 31, pages 2451–2463. Curran Associates, Inc., 2018.
- Soft actor-critic algorithms and applications. ArXiv, abs/1812.05905, 2018.
- Dream to control: Learning behaviors by latent imagination. ArXiv, abs/1912.01603, 2020.
- Mastering atari with discrete world models. International Conference on Learning Representations, 2021.
- Deep hierarchical planning from pixels. Advances in Neural Information Processing Systems, 35:26091–26104, 2022.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Temporal difference learning for model predictive control. In ICML, 2022.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. 2023a.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. In International Conference on Machine Learning (ICML), 2023b.
- Td-mpc2: Scalable, robust world models for continuous control, 2024.
- Comic: Complementary task learning & mimicry for reusable skills. In International Conference on Machine Learning, pages 4105–4115. PMLR, 2020.
- Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
- H-gap: Humanoid control with a generalist planner. arXiv preprint arXiv:2312.02682, 2023.
- Model-based reinforcement learning for atari. ArXiv, abs/1903.00374, 2020.
- Modem-v2: Visuo-motor world models for real-world robot manipulation. arXiv preprint, 2023.
- Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62, 2022.
- Multi-game decision transformers. ArXiv, abs/2205.15241, 2022.
- Robust and versatile bipedal jumping control through reinforcement learning. In Kostas E. Bekris, Kris Hauser, Sylvia L. Herbert, and Jingjin Yu, editors, Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023. doi: 10.15607/RSS.2023.XIX.052. URL https://doi.org/10.15607/RSS.2023.XIX.052.
- Spawnnet: Learning generalizable visuomotor skills from pre-trained networks, 2023.
- Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201, 2017.
- Hierarchical visuomotor control of humanoids. arXiv preprint arXiv:1811.09656, 2018a.
- Neural probabilistic motor primitives for humanoid control. arXiv preprint arXiv:1811.11711, 2018b.
- Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:1909.10618, 2019.
- Learning and generalization of motor skills by learning from demonstration. In 2009 IEEE International Conference on Robotics and Automation, pages 763–768, 2009. doi: 10.1109/ROBOT.2009.5152385.
- Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4):143:1–143:14, July 2018. ISSN 0730-0301. doi: 10.1145/3197517.3201311. URL http://doi.acm.org/10.1145/3197517.3201311.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Humanoidbench: Simulated humanoid benchmark for whole-body locomotion and manipulation, 2024.
- Skill-based model-based reinforcement learning. 2022.
- Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460–9471, 2022.
- Joint embedding predictive architectures focus on slow features. arXiv preprint arXiv:2211.10831, 2022.
- R. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1998.
- Gradient-based planning with world models. arXiv preprint arXiv:2312.17227, 2023.
- Deepmind control suite. Technical report, DeepMind, 2018.
- Unitree. H1, 2024. URL www.unitree.com/h1.
- MoCapAct: A multi-task dataset for simulated humanoid control. In Advances in Neural Information Processing Systems, volume 35, pages 35418–35431, 2022.
- Model predictive path integral control using covariance variable importance sampling. ArXiv, abs/1509.01149, 2015.
- On the feasibility of cross-task transfer with model-based reinforcement learning. 2023.
- Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
- Solar: Deep structured latent representations for model-based reinforcement learning. ArXiv, abs/1808.09105, 2018.
- Online decision transformer. In ICML, 2022.
- Nicklas Hansen (22 papers)
- Jyothir S V (6 papers)
- Vlad Sobal (8 papers)
- Yann LeCun (173 papers)
- Xiaolong Wang (243 papers)
- Hao Su (218 papers)