Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Embodied AI with MuscleMimic: Unlocking full-body musculoskeletal motor learning at scale

Published 26 Mar 2026 in cs.RO | (2603.25544v1)

Abstract: Learning motor control for muscle-driven musculoskeletal models is hindered by the computational cost of biomechanically accurate simulation and the scarcity of validated, open full-body models. Here we present MuscleMimic, an open-source framework for scalable motion imitation learning with physiologically realistic, muscle-actuated humanoids. MuscleMimic provides two validated musculoskeletal embodiments - a fixed-root upper-body model (126 muscles) for bimanual manipulation and a full-body model (416 muscles) for locomotion - together with a retargeting pipeline that maps SMPL-format motion capture data onto musculoskeletal structures while preserving kinematic and dynamic consistency. Leveraging massively parallel GPU simulation, the framework achieves order-of-magnitude training speedups over prior CPU-based approaches while maintaining comprehensive collision handling, enabling a single generalist policy to be trained on hundreds of diverse motions within days. The resulting policy faithfully reproduces a broad repertoire of human movements under full muscular control and can be fine-tuned to novel motions within hours. Biomechanical validation against experimental walking and running data demonstrates strong agreement in joint kinematics (mean correlation r = 0.90), while muscle activation analysis reveals both the promise and fundamental challenges of achieving physiological fidelity through kinematic imitation alone. By lowering the computational and data barriers to musculoskeletal simulation, MuscleMimic enables systematic model validation across diverse dynamic movements and broader participation in neuromuscular control research. Code, models, checkpoints, and retargeted datasets are available at: https://github.com/amathislab/musclemimic

Summary

  • The paper presents MuscleMimic, an open-source framework that scales imitation learning for physiologically accurate, muscle-actuated humanoid models.
  • It leverages GPU-parallelized simulation and single-epoch PPO updates to achieve up to ~13,000 training steps/sec while ensuring high kinematic fidelity.
  • The framework validates muscle and joint parameters against experimental data, demonstrating impressive motion imitation accuracy and EMG correlation.

Large-Scale Musculoskeletal Motor Control with MuscleMimic

Framework Overview and Musculoskeletal Model Design

MuscleMimic introduces an open-source framework for scalable imitation learning in physiologically realistic, muscle-actuated humanoids. The framework provides two distinct, validated musculoskeletal (MSK) models: MyoBimanualArm (fixed root, upper body, 126 muscles) and MyoFullBody (free root, full body, 416 muscles). Both models leverage detailed anatomical structure and Hill-type muscle actuators, capturing delayed nonlinear muscle activations and moment arms across tasks. Figure 1

Figure 1: Visualization of the MyoBimanualArm model and MyoFullBody model, shown from front, back, and side perspectives.

Comprehensive collision modeling is implemented, with self-collision and environment contact support for both embodiments. Muscle and joint parameters are validated against a broad range of experimental and MRI/cadaver-derived datasets, and moment arms, force-length, and muscle geometry are iteratively refined to match empirical data. The model design enforces symmetry and coupled degrees of freedom to ensure anatomical fidelity.

Scalable Imitation Learning via GPU-Parallelized Simulation

The principal challenge in MSK motor control is the immense simulation cost due to complex muscle models and high-dimensional action spaces. MuscleMimic leverages GPU-accelerated MuJoCo Warp, supporting thousands of environments in parallel on modern GPUs (NVIDIA H100/H200), yielding up to โˆผ\sim13,000 training steps/second at 8192 parallel environments. Figure 2

Figure 2: Raw training steps per second as the number of parallel environments scales, showing a 7800%7800\% throughput increase at n=8192n=8192.

On-policy RL is employed with PPO variants; the authors demonstrate that standard multi-epoch batch updates, commonly used to improve sample efficiency, induce catastrophic distribution shifts in highly parallel settings with delayed muscle dynamics, resulting in policy collapse. Single-epoch updates (E=1E=1) are empirically superior for MSK systems, with higher asymptotic performance, lower KL divergence, and improved stability. Figure 3

Figure 3: Effect of gradient epochs (EE) on training stability, with catastrophic KL divergence for E>1E>1 versus stability for E=1E=1.

Large batch sizes further improve reward, exploration stability, and reduce off-policy drift, supporting the advantage of massive parallelism and strict on-policy training for complex MSK imitation. Figure 4

Figure 4: Larger minibatch sizes result in higher asymptotic rewards and more stable policy updates.

Motion Retargeting and Data Pipeline

A novel motion retargeting pipeline maps SMPL-format MoCap data onto the MSK morphologies, enforcing kinematic and dynamic consistency. Two retargeting approaches are compared: Mocap-Body (physics-based, less constraint) and GMR-Fit (kinematic with equality constraint and joint limit enforcement). GMR-Fit provides lower joint violation and tendon instability rates, resulting in more feasible reference trajectories for control. Figure 5

Figure 5: Schematic of the motion retargeting pipeline integrating SMPL shape fitting, inverse kinematics, and post-processing for MSK alignment.

The framework supports full-body and upper-limb motion datasets, with mimic sites distributed for tracking key kinematic features. Figure 6

Figure 6: Distribution of mimic sites used for full-body and upper-limb motion imitation.

Motion Imitation Performance and Generalization

MuscleMimic can train policies that imitate hundreds of MoCap trajectories with tens of billions of steps, yielding generalist policies for diverse, anatomically realistic movements. Quantitative metrics indicate high success rates (92-99%), low joint errors (โˆผ6โˆ’8โˆ˜\sim6-8^\circ), and minimal trajectory deviations, outperforming previous MSK imitation pipelines in both scale and sample efficiency. Figure 7

Figure 7: Motion samples from MyoBimanualArm, depicting complex upper-limb skills (e.g., object lifting/throwing, waving, pouring tasks).

Figure 8

Figure 8: MyoFullBody reproducing walking, running, turning, dancing, jumping, and kick twist motions.

Fine-tuning allows rapid adaptation to novel and highly dynamic behaviors within hours, enabled by the pretrained generalist.

Biomechanical Validation and Muscle Activation Analysis

Validation against independent human experimental datasets is conducted for gross kinematics, joint moments, GRF, and EMG during both walking and running. Simulated joint angles achieve mean correlation r=0.9r=0.9 with treadmill/level-walking data, and r=0.81r=0.81 for running, demonstrating kinematic fidelity. Figure 9

Figure 9

Figure 9: Comparison of walking kinematics at 7800%7800\%0 versus experimental data (hip, knee, ankle profiles).

Muscle activation analysis reveals synthetic muscle activity tracks key EMG features, with per-muscle correlations spanning 7800%7800\%1โ€“7800%7800\%2 and averages approaching human inter-subject variability. Figure 10

Figure 10: Gait cycle-averaged synthetic muscle activations versus experimental EMG across eight lower-limb muscles.

The results underscore that strict kinematic imitation does not guarantee physiological muscle patternsโ€”a manifestation of muscle redundancyโ€”though imitation-based controllers outperform non-imitation baselines on EMG plausibility.

Ablations, Limitations, and Research Implications

Ablation studies demonstrate model performance scales with network capacity, looser episode termination, and diversity of motion data. Policies exhibit robust transfer from GPU-parallel training to standard CPU environments.

However, MuscleMimic inherits limitations from MSK modeling: inelastic tendons, absent pennation, simplified activation dynamics, and reliance on SMPL-based retargeting that may obscure pathological or idiosyncratic anthropometrics. High kinematic fidelity does not guarantee correct neural control strategies, emphasizing the necessity for direct experimental muscle data as validation targets.

Theoretical and Practical Implications

MuscleMimic represents a substantial step forward for embodied AI: it brings biomechanically plausible, data-efficient motor learningโ€”previously impractical at scaleโ€”into reach for broad neuromechanical, computational neuroscience, and rehabilitation research. The open-source release includes code, validated models, datasets, and training infrastructure, providing a robust testbed for future advances in neuromuscular control, motor disorder simulation, exoskeleton design, and integrative neural-physical agent modeling.

Ongoing and future work should explore improved muscle modeling (tendon elasticity, path-dependent activation, individualized morphology), curriculum-based learning for dynamic tasks, transfer to physical hardware, and integration with neural network world models or differentiable brain-body pipelines for end-to-end embodied cognition.

Conclusion

MuscleMimic enables scalable, validated motion imitation for anatomically grounded MSK embodiments, unlocking systematic stress-testing and generalizable motor learning previously infeasible with CPU-based tools. The work substantiates the utility of large-scale GPU simulation and strict on-policy optimization for overactuated, nonlinear biomechanical systems and highlights open challenges at the intersection of motion imitation, neuromechanics, and physiologically faithful AI control.

(2603.25544)

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.