- The paper presents MuscleMimic, an open-source framework that scales imitation learning for physiologically accurate, muscle-actuated humanoid models.
- It leverages GPU-parallelized simulation and single-epoch PPO updates to achieve up to ~13,000 training steps/sec while ensuring high kinematic fidelity.
- The framework validates muscle and joint parameters against experimental data, demonstrating impressive motion imitation accuracy and EMG correlation.
Large-Scale Musculoskeletal Motor Control with MuscleMimic
Framework Overview and Musculoskeletal Model Design
MuscleMimic introduces an open-source framework for scalable imitation learning in physiologically realistic, muscle-actuated humanoids. The framework provides two distinct, validated musculoskeletal (MSK) models: MyoBimanualArm (fixed root, upper body, 126 muscles) and MyoFullBody (free root, full body, 416 muscles). Both models leverage detailed anatomical structure and Hill-type muscle actuators, capturing delayed nonlinear muscle activations and moment arms across tasks.
Figure 1: Visualization of the MyoBimanualArm model and MyoFullBody model, shown from front, back, and side perspectives.
Comprehensive collision modeling is implemented, with self-collision and environment contact support for both embodiments. Muscle and joint parameters are validated against a broad range of experimental and MRI/cadaver-derived datasets, and moment arms, force-length, and muscle geometry are iteratively refined to match empirical data. The model design enforces symmetry and coupled degrees of freedom to ensure anatomical fidelity.
Scalable Imitation Learning via GPU-Parallelized Simulation
The principal challenge in MSK motor control is the immense simulation cost due to complex muscle models and high-dimensional action spaces. MuscleMimic leverages GPU-accelerated MuJoCo Warp, supporting thousands of environments in parallel on modern GPUs (NVIDIA H100/H200), yielding up to โผ13,000 training steps/second at 8192 parallel environments.
Figure 2: Raw training steps per second as the number of parallel environments scales, showing a 7800% throughput increase at n=8192.
On-policy RL is employed with PPO variants; the authors demonstrate that standard multi-epoch batch updates, commonly used to improve sample efficiency, induce catastrophic distribution shifts in highly parallel settings with delayed muscle dynamics, resulting in policy collapse. Single-epoch updates (E=1) are empirically superior for MSK systems, with higher asymptotic performance, lower KL divergence, and improved stability.
Figure 3: Effect of gradient epochs (E) on training stability, with catastrophic KL divergence for E>1 versus stability for E=1.
Large batch sizes further improve reward, exploration stability, and reduce off-policy drift, supporting the advantage of massive parallelism and strict on-policy training for complex MSK imitation.
Figure 4: Larger minibatch sizes result in higher asymptotic rewards and more stable policy updates.
Motion Retargeting and Data Pipeline
A novel motion retargeting pipeline maps SMPL-format MoCap data onto the MSK morphologies, enforcing kinematic and dynamic consistency. Two retargeting approaches are compared: Mocap-Body (physics-based, less constraint) and GMR-Fit (kinematic with equality constraint and joint limit enforcement). GMR-Fit provides lower joint violation and tendon instability rates, resulting in more feasible reference trajectories for control.
Figure 5: Schematic of the motion retargeting pipeline integrating SMPL shape fitting, inverse kinematics, and post-processing for MSK alignment.
The framework supports full-body and upper-limb motion datasets, with mimic sites distributed for tracking key kinematic features.
Figure 6: Distribution of mimic sites used for full-body and upper-limb motion imitation.
MuscleMimic can train policies that imitate hundreds of MoCap trajectories with tens of billions of steps, yielding generalist policies for diverse, anatomically realistic movements. Quantitative metrics indicate high success rates (92-99%), low joint errors (โผ6โ8โ), and minimal trajectory deviations, outperforming previous MSK imitation pipelines in both scale and sample efficiency.
Figure 7: Motion samples from MyoBimanualArm, depicting complex upper-limb skills (e.g., object lifting/throwing, waving, pouring tasks).
Figure 8: MyoFullBody reproducing walking, running, turning, dancing, jumping, and kick twist motions.
Fine-tuning allows rapid adaptation to novel and highly dynamic behaviors within hours, enabled by the pretrained generalist.
Biomechanical Validation and Muscle Activation Analysis
Validation against independent human experimental datasets is conducted for gross kinematics, joint moments, GRF, and EMG during both walking and running. Simulated joint angles achieve mean correlation r=0.9 with treadmill/level-walking data, and r=0.81 for running, demonstrating kinematic fidelity.

Figure 9: Comparison of walking kinematics at 7800%0 versus experimental data (hip, knee, ankle profiles).
Muscle activation analysis reveals synthetic muscle activity tracks key EMG features, with per-muscle correlations spanning 7800%1โ7800%2 and averages approaching human inter-subject variability.
Figure 10: Gait cycle-averaged synthetic muscle activations versus experimental EMG across eight lower-limb muscles.
The results underscore that strict kinematic imitation does not guarantee physiological muscle patternsโa manifestation of muscle redundancyโthough imitation-based controllers outperform non-imitation baselines on EMG plausibility.
Ablations, Limitations, and Research Implications
Ablation studies demonstrate model performance scales with network capacity, looser episode termination, and diversity of motion data. Policies exhibit robust transfer from GPU-parallel training to standard CPU environments.
However, MuscleMimic inherits limitations from MSK modeling: inelastic tendons, absent pennation, simplified activation dynamics, and reliance on SMPL-based retargeting that may obscure pathological or idiosyncratic anthropometrics. High kinematic fidelity does not guarantee correct neural control strategies, emphasizing the necessity for direct experimental muscle data as validation targets.
Theoretical and Practical Implications
MuscleMimic represents a substantial step forward for embodied AI: it brings biomechanically plausible, data-efficient motor learningโpreviously impractical at scaleโinto reach for broad neuromechanical, computational neuroscience, and rehabilitation research. The open-source release includes code, validated models, datasets, and training infrastructure, providing a robust testbed for future advances in neuromuscular control, motor disorder simulation, exoskeleton design, and integrative neural-physical agent modeling.
Ongoing and future work should explore improved muscle modeling (tendon elasticity, path-dependent activation, individualized morphology), curriculum-based learning for dynamic tasks, transfer to physical hardware, and integration with neural network world models or differentiable brain-body pipelines for end-to-end embodied cognition.
Conclusion
MuscleMimic enables scalable, validated motion imitation for anatomically grounded MSK embodiments, unlocking systematic stress-testing and generalizable motor learning previously infeasible with CPU-based tools. The work substantiates the utility of large-scale GPU simulation and strict on-policy optimization for overactuated, nonlinear biomechanical systems and highlights open challenges at the intersection of motion imitation, neuromechanics, and physiologically faithful AI control.
(2603.25544)