Hierarchical World Models as Visual Whole-Body Humanoid Controllers (2405.18418v2)

Published 28 May 2024 in cs.LG, cs.CV, and cs.RO

Abstract: Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://nicklashansen.com/rlpuppeteer

References (52)

Authors (6)

Nicklas Hansen (22 papers)
Jyothir S V (6 papers)
Vlad Sobal (8 papers)
Yann LeCun (173 papers)
Xiaolong Wang (243 papers)
Hao Su (218 papers)

Citations (6)

View on Semantic Scholar

Summary

The paper presents Puppeteer, a hierarchical world model employing dual-level model-based RL to generate natural, human-like motions in high-dimensional humanoid control.
It leverages a low-level tracking agent pretrained on MoCap data and a high-level visual puppeteering agent to efficiently coordinate joint-level and task-specific actions.
Experimental results show over 95% user preference for its natural motions and robust performance across 8 diverse whole-body control tasks.

Puppeteer: A Hierarchical World Model for Visual Whole-Body Humanoid Control

Overview

The paper presents Puppeteer, a hierarchical world model aimed at tackling the high-dimensional problem of whole-body control for humanoids, using visual observations. The framework leverages data-driven reinforcement learning (RL) approaches to generate natural, human-like motion without relying on manual reward engineering or pre-defined skill primitives. Notably, Puppeteer consists of two hierarchically organized agents: a low-level proprioceptive agent and a high-level visual puppeteering agent. Both agents are trained through model-based RL, enabling the system to accomplish a diverse set of tasks with a simulated 56-DoF humanoid.

Methodology

Hierarchical World Model

Puppeteer's core architecture is a hierarchical world model wherein:

Low-Level Tracking Agent: Trained on human MoCap data to track reference motions. This agent receives proprioceptive state $\mathbf{q}_t$ and a command $\mathbf{c}_t$ as inputs and synthesizes a sequence of actions that follow these commands.
High-Level Puppeteering Agent: Uses visual observations to generate commands for the low-level agent based on the downstream task's requirements. This agent processes both the proprioceptive state $\mathbf{q}_t$ and visual input $\mathbf{v}_t$ to produce reference commands.

The agents operate on different levels of abstraction, with the low-level agent focusing on joint-level physics and the high-level agent on end-effector positions, making the entire system computationally efficient and generalizable across tasks.

Key Features

Model-Based RL: The method utilizes TD-MPC2 for both agents, enabling efficient planning and policy optimization through a learned world model without decoding raw observations.
Two-Stage Training: The low-level agent is pretrained on MoCap data and can track various human motions when re-targeted to the humanoid embodiment. The high-level agent is subsequently trained on specific downstream tasks, using the pretrained low-level agent.
Termination Handling: Incorporates a termination prediction head to handle episode termination conditions, particularly important for stability in high-dimensional control tasks.

Experimental Evaluation

Task Suite

An 8-task suite was curated to evaluate Puppeteer, comprising a mix of visual and non-visual whole-body humanoid control tasks. The tasks ranged from straightforward locomotion like walking and running to more complex activities such as jumping over hurdles and navigating stairs.

Performance and Naturalness

Puppeteer demonstrated highly performant control policies, competitive with state-of-the-art methods like TD-MPC2. However, it significantly outperformed others in terms of producing natural, human-like motions. This was quantitatively supported by user studies, wherein over 95% of participants preferred motions generated by Puppeteer over those from TD-MPC2. Further, the method also measured favorably on metrics like average episode length and mean torso height, reflecting more realistic humanoid behavior.

Ablation Studies

The paper provides a thorough ablation paper highlighting the importance of:

Mixed offline and online data during the low-level agent's pretraining for enhanced robustness.
Planning over model-free policies, showing that planning at both hierarchical levels was critical for high-dimensional control.
Zero-shot generalization, where Puppeteer successfully handled larger gap lengths in the gaps task than it encountered during training.

Implications and Future Work

Practical Implications

This research offers significant practical potential for humanoid robotics, particularly where human-like motion and real-time decision-making are critical, such as in service robotics, search and rescue operations, and entertainment industries.

Theoretical Implications

From a theoretical perspective, Puppeteer advances the understanding of hierarchical RL and model-based planning in high-dimensional spaces. It demonstrates the efficacy of combining data-driven approaches with hierarchical planning, thus paving the way for more generalized and adaptable robotic systems.

Future Developments

Future research could explore extending this hierarchical framework to more complex, real-world scenarios, incorporating richer sensory inputs and more dynamic tasks. Further investigation into the generalization capabilities will also be crucial to improving robustness across varying environmental conditions and task requirements.

Conclusion

Puppeteer represents a significant step forward in the field of humanoid control, offering a robust, data-driven approach to achieving natural, human-like motion through a hierarchical world model. Its ability to handle a wide range of tasks with minimal assumption marks it as a versatile framework that holds promise for both academic research and practical applications in robotics.

PDF Markdown

Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1795639173548044755

https://twitter.com/OWW/status/1797710296888250575

https://twitter.com/arxivsanitybot/status/1795809155753676834

https://twitter.com/gm8xx8/status/1795663138257523189