From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots (2506.12779v2)

Published 15 Jun 2025 in cs.RO and cs.LG

Abstract: Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an expert-generalist learning framework that combines motion clustering and sim-to-real adaptation to overcome these challenges. BB first leverages an autoencoder-based clustering method to group behaviorally similar motions using motion features and motion descriptions. Expert policies are then trained within each cluster and refined with real-world data through iterative delta action modeling to bridge the sim-to-real gap. Finally, these experts are distilled into a unified generalist controller that preserves agility and robustness across all motion types. Experiments on two simulations and a real humanoid robot demonstrate that BB achieves state-of-the-art general whole-body control, setting a new benchmark for agile, robust, and generalizable humanoid performance in the real world.

Summary

The paper presents the BumbleBee framework, an expert-to-generalist strategy that clusters motion data and distills expert policies for versatile humanoid control.
It employs an autoencoder-based motion clustering and iterative sim-to-real adaptation process to bridge the gap between simulation and real-world performance.
Experimental results demonstrate a significant success rate improvement to 66.84%, outperforming previous methods and highlighting enhanced motion fidelity and generalization.

Toward Generalizable Whole-Body Control for Humanoid Robots

The paper "From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots" presents the BumbleBee (BB) framework aimed at advancing whole-body control in humanoid robotics. This paper addresses the challenges posed by the diverse motion demands and data conflicts inherent in existing frameworks that typically focus on training single motion-specific policies. The BB framework proposes an innovative approach that combines motion clustering and sim-to-real adaptation, culminating in the development of a unified generalist controller.

Technical Overview

The BB framework introduces a novel expert-generalist learning strategy that leverages motion clustering through an autoencoder-based methodology. This involves grouping behaviorally similar motions based on motion features and descriptions. Within each cluster, expert policies are trained and refined using real-world data through iterative delta action modeling, which effectively bridges the sim-to-real gap. This iterative process enhances the controller's performance in real-world settings. The experts are then distilled into a unified generalist controller, achieving agility and robustness across varied motion types.

Numerical Results

The paper provides empirical evidence through experiments conducted in simulation environments and on real humanoid robots. BB has set new benchmarks in general whole-body control, outperforming state-of-the-art models in the key metrics: Success Rate (SR), Mean Per Joint Position Error (MPJPE), and Mean Per Keypoint Position Error (MPKPE). Notably, in the MuJoCo simulator, the BB framework improved the success rate to 66.84%, significantly outperforming previous methods such as Exbody2, which achieved a 50.19% success rate. These results imply that BB's approach to clustering and iterative refinement effectively enhances motion fidelity and generalizes well to diverse tasks.

Bold Claims and Implications

The paper boldly claims that its expert-to-generalist framework effectively mitigates cross-task training conflicts by segmenting motion data, and provides a structured approach for sim-to-real adaptation. These claims are substantiated through its demonstrated superior performance in both simulation and real-world settings. The implications of this research are multifaceted:

Theoretical Implications: The motion clustering methodology enhances understanding of how motion types influence control dynamics, offering insights that could guide future research on adaptable and robust robotic control systems.
Practical Implications: This framework provides a blueprint for developing more versatile humanoid robots capable of performing complex tasks across varied domains with high agility and reliability.

Future Outlook

Looking ahead, this approach opens avenues for further exploration into the integration of additional sensory inputs, such as high precision localization, to enhance real-world applicability. Moreover, investigating how cluster-specific methodologies can be expanded beyond humanoid robots to other robotic forms or applications might also prove beneficial.

In summary, the BumbleBee framework advances the field of humanoid robotics by offering a new paradigm in whole-body control, characterized by its ability to adapt across diverse motion types and environments. The approach not only improves current benchmarks but paves the way for more adaptable, agile, and robust humanoid robots.

Related Papers

Tweets

https://twitter.com/RoboReading/status/1936451746840093024