Dynamic Whole-Body Dancing with Humanoid Robots -- A Model-Based Control Approach

Published 5 Apr 2026 in cs.RO | (2604.03999v1)

Abstract: This paper presents an integrated model-based framework for generating and executing dynamic whole-body dance motions on humanoid robots. The framework operates in two stages: offline motion generation and online motion execution, both leveraging future state prediction to enable robust and dynamic dance motions in real-world environments. In the offline motion generation stage, human dance demonstrations are captured via a motion capture (MoCap) system, retargeted to the robot by solving a Quadratic Programming (QP) problem, and further refined using Trajectory Optimization (TO) to ensure dynamic feasibility. In the online motion execution stage, a centroidal dynamics-based Model Predictive Control (MPC) framework tracks the planned motions in real time and proactively adjusts swing foot placement to adapt to real world disturbances. We validate our framework on the full-size humanoid robot Kuavo 4Pro, demonstrating the dynamic dance motions both in simulation and in a four-minute live public performance with a team of four robots. Experimental results show that longer prediction horizons improve both motion expressiveness in planning and stability in execution.

Abstract PDF Upgrade to Chat

Authors (15)

Summary

The paper develops a hierarchical, model-based control framework for generating and executing dynamic whole-body dance motions on humanoid robots.
It integrates human motion capture, geometric retargeting, and receding-horizon trajectory optimization to enforce joint limits and maintain balance.
Experimental validations demonstrate enhanced motion expressiveness, dynamic balance, and robustness in both simulated and live multi-robot performances.

Dynamic Whole-Body Dancing with Humanoid Robots: A Model-Based Control Approach

Introduction

The paper "Dynamic Whole-Body Dancing with Humanoid Robots -- A Model-Based Control Approach" (2604.03999) introduces a comprehensive and hierarchical framework to generate and execute expressive, dynamic dance motions on full-size humanoid robots. By integrating model-based optimization techniques with receding-horizon control, the authors address the challenge of producing physically feasible, coordinated, and robust whole-body movements that involve simultaneous upper and lower limb activities. Central to their methodology is leveraging human motion data, sophisticated motion retargeting, and model predictive control (MPC) both offline and online, enabling successful deployment during extended, multi-robot public performances.

Figure 1: Dynamic whole-body dance motions performed by a team of four humanoid robots.

Motion Generation Framework

The motion generation pipeline is organized into three core stages: human demonstration acquisition, geometric motion retargeting, and dynamic retargeting via trajectory optimization.

Initially, high-resolution optical motion capture is deployed to acquire dense, 3D human dance trajectories, ensuring temporal consistency and mitigating occlusion-induced artifacts. Geometric motion retargeting then employs a weighted quadratic program to map these human poses onto the robot's kinematics, with anthropomorphic scaling to enhance workspace feasibility and preserve essential stylistic elements. Manual annotation further augments the dataset with accurate contact scheduling.

While geometric retargeting ensures kinematic feasibility, it neglects actuation and contact constraints. To address this, the framework solves a trajectory optimization problem over a receding horizon, enforcing centroidal dynamics, joint torque limits, contact friction cones, and support phase consistency. This approach leverages momentum modulation: rather than over-constraining instantaneous states, it anticipates and coordinates large limb movements by exploiting future state predictions.

Figure 2: Overview of the dance motion generation framework: (a) motion capture, (b) geometric retargeting, (d) dynamic retargeting, and (e) generation of feasible motions.

Online Motion Execution

Execution in unstructured environments is challenged by model inaccuracies, sensor noise, actuator errors, and unpredictable surface properties. The control stack, deployed on a 1.66 m, 55 kg Kuavo 4Pro humanoid, comprises centroidal MPC, hierarchical whole-body control (WBC), and tightly integrated state estimation.

Centroidal MPC is responsible for tracking and refining motion plans in real-time, running at 50 Hz, and utilizing online feedback to adjust swing foot placement proactively. This prevents Zero Moment Point (ZMP) excursions beyond the support polygon, a crucial requirement for dynamic balancing in high-momentum maneuvers. The MPC problem structure mirrors that of the offline optimal control formulation to eliminate planning–execution distribution shift. WBC operates at 500 Hz, fusing PD feedback and feedforward terms and maintaining compliance with robot dynamics, friction, and joint limits.

Figure 3: Overview of the online motion execution framework, including centroidal MPC and whole-body control for real-time robust dance tracking.

Experimental Validation and Analysis

Validation covers both simulated and real-world experiments with the Kuavo 4Pro. The hardware system—equipped with high-torque actuation, high-frequency joint sensing, and a real-time operating stack—enabled precise evaluation of framework capabilities.

Offline Motion Generation

Dynamic motion retargeting outperforms geometric retargeting by enforcing torque and contact constraints, evident in the trajectory adaptations for knee and hip joints during challenging postures or fast strides. This enforcement prevents joint limit violations and excessive torque demands that would induce instability or damage.

Figure 4: Experimental results comparing dynamic and geometric retargeting, highlighting compliance with joint limits and actuator torques.

Furthermore, the framework leverages optimization over the future horizon to preemptively regulate momentum and contact forces. This anticipatory modulation facilitates coordinated upper–lower limb maneuvers, maintaining ZMP trajectories well inside the support polygon—a critical criterion for dynamic balance during complex dance steps.

Figure 5: Momentum modulation results—dynamic retargeting reduces excessive angular velocities and ensures stable ZMP.

Numerical results reveal that increasing the prediction horizon in trajectory optimization—from 0.6 s to 1.2 s—significantly boosts the achievable expressiveness quantified by swing leg and CoM velocity. Short horizons induce over-conservative behaviors, limiting dynamic range; long horizons enable aggressive, artistic motions while preserving balance.

Figure 6: Higher swing leg and CoM velocities with longer optimization horizons, indicating improved dynamic expressiveness.

Online Execution Robustness

Online MPC with longer prediction horizons similarly enhances robustness to disturbances and environmental uncertainties. Experiments under differentiated horizon lengths (0.6 s vs. 1.2 s) demonstrate that proactive swing-foot adjustments are critical when tracking errors or unmodeled disturbances accumulate. Longer horizons allow the controller to avoid knee joint limit violations and anticipate foot landing regions that maintain ZMP margins, whereas short horizons degrade stability and result in falls.

Figure 7: Foot trajectory corrections with longer MPC horizons, demonstrating superior stability and proactive adaptation.

Average solver performance supports achieving control frequencies close to 50 Hz at horizon lengths up to 1.2 s, a practical balance between foresight and computational tractability.

Multi-Robot Live Performance

The framework was deployed as a four-minute live performance by four humanoid robots on real-world flooring, including soft carpets that introduced significant support uncertainty. Despite the absence of safety barriers and continuous dynamic motion, the robots maintained balance with no failures, underscoring system robustness, coordination fidelity, and error adaptation in realistic deployments.

Figure 8: The Kuavo 4Pro humanoid robot fleet performing dynamic dances at the 2025 Zhongguancun Forum.

Implications, Limitations, and Future Directions

This work conclusively demonstrates that model-based planning and control pipelines—integrating receding-horizon dynamics and physical constraint enforcement—enable expressive, coordinated, and robust dynamic motions on high-DoF humanoids in real-world, non-stationary environments.

Implications include the viability of deploying anthropomorphic robots for artistic, entertainment, and collaborative tasks outside laboratory settings, provided sufficient computation and modeling are available. The decisive role of prediction horizon length in both planning and execution phases emerges as a principal design lever for trading off expressiveness, constraint satisfaction, and computational burden.

Limitations concern residual requirements for manual annotation in reference motion generation and contact scheduling, as well as the necessity to tune and select horizon lengths to balance real-time constraints and performance. Current computation times, while tractable, may pose difficulties for even more dynamic or complex sequences, especially with higher numbers of contacts, longer dance routines, or less powerful hardware.

Future advancements may pursue automated contact scheduling, learning-based or hybrid retargeting methods that incorporate system dynamics natively, and further acceleration of MPC solvers (e.g., parallelization, learning-based warm-starts). These would broaden applicability to a wider repertoire of human-like skills and increase autonomy in humanoid performance.

Conclusion

The paper presents a demonstrably effective hierarchical, model-based framework for dynamic whole-body dance on humanoid robots. The integration of retargeting, trajectory optimization, and MPC—reinforced through strong experimental evidence—delivers dynamic feasibility, balance, and expressiveness. The explicit demonstration of robust performance in live, unconstrained environments, especially by multiple robots, illustrates the current state-of-the-art in physically grounded humanoid motion control, setting a clear benchmark for subsequent research in both artistic robotics and robust high-DoF control.

Markdown Report Issue