MuscleMimic: Scalable Muscle-Driven Imitation

Updated 4 July 2026

MuscleMimic is a research program that reproduces internal muscle states from observable motion by integrating physiologically realistic musculoskeletal models with scalable GPU-based simulation.
It employs advanced motion retargeting pipelines and reinforcement learning to accurately emulate human joint kinematics and dynamic behavior.
The framework provides rigorous biomechanical validation, linking simulated muscle activations to clinically meaningful data and supporting diverse applications from locomotion to manipulation.

MuscleMimic denotes a line of research that seeks to reproduce, infer, or control internal muscular states from motion, and, in its most explicit contemporary form, an open-source framework for scalable motion imitation learning with physiologically realistic, muscle-actuated humanoids (Li et al., 26 Mar 2026). It combines validated musculoskeletal embodiments, retargeting from SMPL-format motion capture, and massively parallel GPU simulation with comprehensive collision handling, while adjacent work uses the same broader idea to recover muscle deformations from RGB, estimate strand-level activations from kinematics, or infer clinically meaningful muscle activity from markerless motion (Alvarado et al., 8 Jun 2026, Schneider et al., 2024, Cotton, 19 May 2025).

1. Research problem and historical emergence

MuscleMimic addresses a persistent difficulty in neuromusculoskeletal modeling: muscle-driven bodies are high-dimensional, overactuated, nonlinear, and shaped by delayed activation dynamics, tendon paths, and muscle-skeletal geometry. In the 2026 framework, these constraints are framed as both a computational problem—on-policy imitation learning requires millions to billions of simulation steps—and a model-availability problem, since validated full-body musculoskeletal models have historically been scarce (Li et al., 26 Mar 2026).

Earlier work established several of the ingredients later consolidated under the MuscleMimic label. A 2018 reinforcement-learning study used the Normalized Advantage Function to learn bounded muscle excitations in simulated 2D and 3D environments with 6 to 24 axial muscles, reporting RMSE below $1\%$ of the domain of motion after roughly 100,000 simulated steps (Abdi et al., 2018). KINESIS then demonstrated model-free motion imitation on a lower-body musculoskeletal model with 20 DoF and 80 muscle actuators, achieving test frame coverage of $97.71 \pm 0.11\%$ and showing that its muscle activity patterns correlate with human EMG during locomotion (Simos et al., 18 Mar 2025). MIMIC-MJX generalized the same imitation paradigm across rat, fruit fly, mouse forelimb, worm, and stick insect models, including a mouse forelimb with 9 Hill-type muscle actuators, and emphasized embodied closed-loop control rather than offline inverse dynamics (Zhang et al., 25 Nov 2025).

A plausible implication is that “MuscleMimic” has evolved from a narrow control problem—finding excitations for a simulator—into a broader program for linking observable motion to internal muscular variables under realistic morphology, contact, and dynamics.

2. Embodiments and biomechanical substrate

The 2026 framework provides two validated musculoskeletal embodiments, both derived from OpenSim models converted via MyoConverter and refined to enforce bilateral symmetry in equality constraints, joint ranges, moment arms, and force–length curves (Li et al., 26 Mar 2026).

Model	Intended task	Key specifications
MyoBimanualArm	Upper-body manipulation, fixed thorax	76 joints; 126 Hill-type muscle actuators; 54 DoFs; thorax–arms and arm–arm collisions enabled
MyoFullBody	Locomotion and whole-body motion, free root	123 joints; 416 muscles; 72 DoFs; comprehensive self-collision and environment contact

Finger muscles can be disabled, yielding the reduced specifications marked with an asterisk in the source model description.

The actuators are Hill-type actuators per MuJoCo with inelastic tendons and without pennation angles, together with first-order activation dynamics: $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ with

$\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$

Typical time constants are $\tau_{\mathrm{act}} = 0.01$ s and $\tau_{\mathrm{deact}} = 0.04$ s. The full-body model has total mass $84.3$ kg, and the upper-limb mass of both arms is $9.3$ kg (Li et al., 26 Mar 2026).

The simulation substrate is MuJoCo Warp, with optional MJX for reduced-contact settings. Collision handling is a defining feature: the full-body model includes comprehensive self-collision among internal geometries and environment contacts, including foot–ground interaction with 3D Coulomb friction. On a single NVIDIA H100 80GB GPU, the framework reports near-linear scaling up to $n=128$ environments at $1{,}263$ steps-per-second, $97.71 \pm 0.11\%$ 0 SPS at $97.71 \pm 0.11\%$ 1, and approximately $97.71 \pm 0.11\%$ 2 SPS at $97.71 \pm 0.11\%$ 3; one billion environment steps complete in about 20 hours (Li et al., 26 Mar 2026).

3. Retargeting and policy-learning pipeline

Motion retargeting maps SMPL-format data onto the musculoskeletal bodies using two complementary pipelines. Mocap-Body retargeting performs SMPL-H shape fitting in T-pose, estimates $97.71 \pm 0.11\%$ 4, global scale $97.71 \pm 0.11\%$ 5, and per-joint positional and rotational offsets, then solves MuJoCo-based inverse kinematics at mimic sites and applies post-processing to correct ground floating, penetration, and trajectory artifacts. GMR-Fit uses Mink-based retargeting with model-defined joint limits and joint equality constraints, substantially reducing frame-to-frame posture jumps, joint-limit violations, and tendon jumps (Li et al., 26 Mar 2026).

The full-body retargeting uses 17 mimic sites—head, shoulders, elbows, hands, lumbar spine, pelvis, hips, knees, ankles, and toes—while the upper-limb model uses 6 sites. Tendon-path stability is monitored by the relative jump metric

$97.71 \pm 0.11\%$ 6

with an exponential moving average

$97.71 \pm 0.11\%$ 7

and a jump detected when

$97.71 \pm 0.11\%$ 8

using $97.71 \pm 0.11\%$ 9, $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 0, and $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 1 (Li et al., 26 Mar 2026).

Policy learning uses actor–critic MLPs with SiLU activations, LayerNorm, orthogonal initialization, and gated residual blocks. Observations combine proprioception, muscle state, goal features, previous action, and, for the full-body model, foot/toe touch sensors. Actions are normalized muscle excitations for all actuators at 100 Hz. The imitation objective is DeepMimic-style: $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 2 with

$\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 3

where the site-space terms are defined relative to the pelvis. PPO is used with single-epoch updates, $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 4, $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 5, clip $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 6, entropy coefficient $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 7, and minibatches scaled to the $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 8-environment training regime. The full-body model uses rollout length 50 and approximately $\frac{\partial}{\partial t}\mathrm{act} = \frac{\mathrm{ctrl}-\mathrm{act}}{\tau(\mathrm{ctrl},\mathrm{act})},$ 9B timesteps; the bimanual model uses rollout length 10 and approximately $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 0B timesteps (Li et al., 26 Mar 2026).

4. Empirical performance and biomechanical validation

MuscleMimic is evaluated both as an imitation system and as a biomechanical model. On MyoFullBody, train and test success rates are $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 1 and $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 2, with frame coverage $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 3 and $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 4, joint angle error $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 5 and $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 6, joint velocity error $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 7 and $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 8, and relative site position error $\tau (\mathrm{ctrl},\mathrm{act}) = \begin{cases} \tau_{\mathrm{act}} \cdot (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} > 0 \ \tau_{\mathrm{deact}} / (0.5 + 1.5\,\mathrm{act}), & \mathrm{ctrl} - \mathrm{act} \le 0 . \end{cases}$ 9 cm and $\tau_{\mathrm{act}} = 0.01$ 0 cm. On MyoBimanualArm, train and test success rates are $\tau_{\mathrm{act}} = 0.01$ 1 and $\tau_{\mathrm{act}} = 0.01$ 2, with relative site position error $\tau_{\mathrm{act}} = 0.01$ 3 cm and $\tau_{\mathrm{act}} = 0.01$ 4 cm (Li et al., 26 Mar 2026).

Retargeting quality materially affects imitation quality. On the full-body test set at 2B steps, GMR-Fit reduces joint angle error from $\tau_{\mathrm{act}} = 0.01$ 5 to $\tau_{\mathrm{act}} = 0.01$ 6, joint velocity error from $\tau_{\mathrm{act}} = 0.01$ 7 to $\tau_{\mathrm{act}} = 0.01$ 8, and improves mean return from $\tau_{\mathrm{act}} = 0.01$ 9 to $\tau_{\mathrm{deact}} = 0.04$ 0. Across 972 KINESIS motions, GMR-Fit reduces joint-limit violations from $\tau_{\mathrm{deact}} = 0.04$ 1 to $\tau_{\mathrm{deact}} = 0.04$ 2, tendon jump rate from $\tau_{\mathrm{deact}} = 0.04$ 3 to $\tau_{\mathrm{deact}} = 0.04$ 4, and RMSE from $\tau_{\mathrm{deact}} = 0.04$ 5 m to $\tau_{\mathrm{deact}} = 0.04$ 6 m, যদিও it is slower per frame than Mocap-Body (Li et al., 26 Mar 2026).

Biomechanical validation uses external human datasets rather than only internal tracking metrics. For walking, the framework reports mean kinematic correlation $\tau_{\mathrm{deact}} = 0.04$ 7 and joint dynamics correlation $\tau_{\mathrm{deact}} = 0.04$ 8, while reproducing the double-peaked GRF profile. For running, mean kinematic correlation is $\tau_{\mathrm{deact}} = 0.04$ 9 and the simulated GRF exhibits the expected single prominent early-stance peak. Muscle activation analysis against EMG across eight right-leg muscles yields consistently positive correlations, typically in the $84.3$0–$84.3$1 range depending on muscle and seed, and the paper explicitly interprets the remaining variability as an expression of muscle redundancy: multiple coordination strategies can realize similar kinematics without matching human EMG patterns exactly (Li et al., 26 Mar 2026).

Recent work uses the MuscleMimic concept well beyond full-body humanoid imitation. The common structure is an inverse mapping from observable motion or surface data to latent muscular variables, but the latent variable itself differs across studies.

Formulation	Observable input	Output or representative result
SOMA (Alvarado et al., 8 Jun 2026)	Multi-view RGB and skeletal pose	Person-specific canonical muscle and skin displacement fields; first method that attempts to recover muscle deformations from multi-view RGB data
MinT (Schneider et al., 2024)	Pose sequences derived from AMASS/SMPL	402-strand muscle activation time series over 9.8 hours, 227 subjects; 16-layer Transformer reaches lower-body RMSE $84.3$2 and PCC $84.3$3
KinTwin (Cotton, 19 May 2025)	Markerless motion capture from 467 participants	Kinematics replication plus inferred GRFs, joint torques, and muscle activations; muscle-driven mean joint angle error $84.3$4
Neural Musculoskeletal Model (Kumar et al., 7 Mar 2025)	Elbow angle, angular velocity, and load	Deep EMG envelopes for brachialis and triceps medial head; torque-consistency PCC $84.3$5 at 2 kg and $84.3$6 at 4 kg

SOMA focuses on internal geometry rather than control. It learns a layered, pose-conditioned deformation model regularized by biomechanical priors, and on unseen sequences reports global tracking $84.3$7 mm for mean, median, and 90th-percentile marker error, while reducing high-deformation DRE at $84.3$8 mm from $84.3$9 mm for LBS to $9.3$0 mm (Alvarado et al., 8 Jun 2026). MinT instead treats muscle activations as the latent variable and builds a large synthetic dataset from OpenSim, explicitly targeting sequence-to-sequence estimation from motion (Schneider et al., 2024). KinTwin moves toward clinical inverse dynamics, emphasizing able-bodied and impaired movement, assistive-device use, and clinically meaningful contact timing and stride-length errors (Cotton, 19 May 2025). The physics-integrated deep-learning approach to deep EMG reconstruction replaces direct measurement of inaccessible channels with a subject-specific forward musculoskeletal constraint, using OpenSim inverse dynamics as the supervisory anchor (Kumar et al., 7 Mar 2025).

This suggests that MuscleMimic is less a single algorithm than a family of inverse biomechanical programs: some recover shape, some recover activation, and some recover clinically actionable force or EMG surrogates.

6. Limits, open questions, and future directions

The central unresolved issue is physiological identifiability. The full-body framework states this explicitly: high kinematic accuracy does not ensure EMG-consistent activations because muscle redundancy admits many coordination strategies that reproduce the same motion. The paper therefore identifies EMG-informed objectives, reflex and feedback loops, metabolic cost models, and synergy priors as likely requirements for stronger physiological fidelity (Li et al., 26 Mar 2026).

A second limitation is model simplification. In the main framework, tendons are inelastic and pennation is absent; explosive movements such as vertical jumping may require adjusted $9.3$1 or scaled $9.3$2, and the authors note that vertical jumping benefited from $9.3$3 (Li et al., 26 Mar 2026). SOMA trades constitutive simulation for learned layered deformation fields; the result is scalable and interactive, but the method is subject-specific, relies on a marker suit and controlled multi-view capture for training, and does not estimate activation signals or per-muscle forces (Alvarado et al., 8 Jun 2026). MinT provides scale and coverage, but its labels are synthetic rather than measured EMG, object contact forces are usually absent, and subject-specific muscle strength or tendon slack lengths are not modeled (Schneider et al., 2024). KinTwin demonstrates clinically meaningful inverse-dynamics inference, yet explicitly relies on residual force control to absorb external forces from walkers, canes, therapists, and chairs when exact physical replication is otherwise impossible (Cotton, 19 May 2025).

Across these systems, the broader controversy is whether kinematics alone can determine internal muscle state. The literature consistently treats the problem as underconstrained. Current work therefore trends toward hybridization: markerless supervision for surface tracking, OpenSim-derived synthetic labels for scale, EMG or torque consistency for personalization, and stronger biomechanical priors on contact, volume, and activation dynamics (Alvarado et al., 8 Jun 2026, Schneider et al., 2024, Kumar et al., 7 Mar 2025). In that sense, MuscleMimic remains an active research program rather than a closed solution: it has made muscle-driven imitation practical at full-body scale, but it has also clarified how far practical imitation remains from uniquely recovering human neuromuscular control (Li et al., 26 Mar 2026).