Papers
Topics
Authors
Recent
2000 character limit reached

Momentum HumanRig: Parametric Body Model

Updated 1 January 2026
  • Momentum HumanRig (MHR) is a parametric, anatomically motivated human body model that decouples skeletal articulation from external deformations to support robust animation and retargeting.
  • It leverages a hybrid corrective system using non-linear MLP-driven pose correctives and artist-defined skinning weights to enhance physical plausibility and realism.
  • Empirical evaluations show MHR outperforms models like SMPL-X in reconstruction accuracy and joint articulation, making it ideal for AR/VR graphics and robotics applications.

Momentum HumanRig (MHR) is a parametric, anatomically-motivated human body model that enables expressive, physically consistent animation and motion retargeting for both graphics and robotics. Developed by Ferguson et al., MHR explicitly decouples skeletal articulation from external surface deformations, supporting semantic control, non-linear pose correctives, and real-time integration into a broad range of AR/VR, graphics, and robotics pipelines. MHR advances prior art by fusing the skeleton/shape decoupling of ATLAS with a modern corrective-driven rig and production-grade tooling, while enabling robust, temporally consistent motion representations suitable for robot retargeting and physically plausible animation (Ferguson et al., 19 Nov 2025, Tu et al., 25 Dec 2025).

1. Structural Model and Decoupling Paradigm

MHR’s foundational design is the separation of internal skeletal kinematics from external surface geometry. It adopts and extends ATLAS’s paradigm by significantly increasing anatomical coverage—with 127 joints spanning root, spine, limbs, hands, fingers, eyes, jaw—compared to ATLAS’s 77. The external surface deformation is factored into distinct semantic channels: identity (body/head/hand shape), skeletal scale, and fine-scale surface expression (e.g., FACS-style controls), enabling independent, artist-friendly manipulation of each.

MHR formalizes these latent channels as separate coefficient vectors: βs∈Rns\beta^s\in\mathbb{R}^{n^s} (identity, ns=20n^s=20 body, 20 head, 5 hands), βf∈R72\beta^f\in\mathbb{R}^{72} (expressions), βk∈R68\beta^k\in\mathbb{R}^{68} (skeleton scale), and θ∈R136\theta\in\mathbb{R}^{136} (joint pose). The surface mesh X~\tilde X is parameterized as

X~(βs,βf,θ)=Xˉ+Bs(βs)+Bf(βf)+Bp(θ)\tilde X(\beta^s, \beta^f, \theta) = \bar X + B^s(\beta^s) + B^f(\beta^f) + B^p(\theta)

where Xˉ\bar X is the neutral template, BsB^s, BfB^f respectively represent shape and expression offsets, and BpB^p introduces pose-dependent correctives.

MHR further supports multiple levels of detail (LoDs), with artist-driven, hand-edited skinning weights (4 influences/vertex LoDs 1–4; 8 for LoD 0), facilitating deployment from lightweight interactive settings to high-fidelity offline rendering or simulation (Ferguson et al., 19 Nov 2025).

2. Parameterization, Equations, and Kinematic Chain

The skeleton incorporates 127 joints, each defined by translation Tt∈R3T_t\in\mathbb{R}^3, rotation Trot∈T_\mathrm{rot}\inSO(3) (Euler-XYZ), and a uniform scale TsT_s, composed with joint-specific pre-rotation and offset matrices. The hierarchical world transform is

Tw=Tparentâ‹…Toffâ‹…Ttâ‹…Tprerotâ‹…Trotâ‹…TsT_w = T_\mathrm{parent}\cdot T_\mathrm{off}\cdot T_t\cdot T_\mathrm{prerot}\cdot T_\mathrm{rot}\cdot T_s

and all degrees of freedom are packed into θ∈R136\theta\in\mathbb{R}^{136}, mapped to joint parameter vectors Θj∈R127×7\Theta_j\in\mathbb{R}^{127\times7}. Linear blend skinning (LBS) applies these transforms to the surface:

X(β,θ)=M(X~,Bk(βk),θ,ω)X(\beta, \theta) = M(\tilde X, B^k(\beta^k), \theta, \omega)

where MM blends vertex positions using artist-defined weights ω\omega, computed from the full joint chain.

Expressions are represented as 72 semantic FACS-style blendshapes (artist sculpted), rather than a purely data-driven or PCA face basis. Sparse skeleton transformation coefficients permit explicit control of limb and segment proportions for nuanced anthropometric scaling.

3. Corrective System: Sparse and Non-Linear Pose-Dependent Deformations

LBS models are known to produce artifacts—most notably linear blending issues such as "candy-wrapper" twists—at highly articulated joints. To address this, MHR incorporates a hybrid corrective system: each joint jj is assigned a local non-linear corrective Bjp(θ)∈R3VB^p_j(\theta)\in\mathbb{R}^{3V}, where VV is the vertex count at a given LoD. These are computed as follows:

  • Per-joint and one-ring neighbor 6D rotation deviations δRa=R6d(θa)−R6d(0)\delta R_{a} = R_{6d}(\theta_a) - R_{6d}(0) are computed.
  • Deviations are non-linearly embedded via lightweight multi-layer perceptrons (MLPj_j): hj=MLPj({δRa})h_j = \mathrm{MLP}_j(\{\delta R_a\}).
  • A learned, sparsity-regularized per-vertex mask AjA_j and decode matrix PjP_j produce the final influence:

Bjp=ReLU(Aj)⊙(Pjhj)B^p_j = \mathrm{ReLU}(A_j) \odot (P_j h_j)

with global correction Bp(θ)=∑j=1njBjp(θ)B^p(\theta)=\sum_{j=1}^{n_j}B^p_j(\theta). Masks are initialized by geodesic proximity to the joint segment, regularized for sparsity (L1_1 penalty), and further constrained during fit by terms penalizing point-to-surface distance, keypoint reprojection error, mask non-sparsity, and joint-limit violations (Ferguson et al., 19 Nov 2025).

4. Implementation, Pipeline Integration, and Performance

MHR is implemented atop the Momentum library, with C++/Python bindings and PyTorch interoperability. The architecture supports robust export (FBX, GLTF), native rig parameter serialization, and is GPU-accelerated—LBS executes as a compute shader, per-joint MLPs are computationally lightweight (c≈16c\approx16), producing negligible runtime overhead.

Weights and correctives are mapped between mesh resolutions using barycentric mapping or subdivision, ensuring consistent deformation across LoDs. On a standard desktop GPU, MHR achieves over 120 fps for full-body, 18,000-vertex animation inclusive of skinning and pose correctives. This runtime profile is suitable for both interactive AR/VR graphics and real-time robotics control (Ferguson et al., 19 Nov 2025).

5. Motion Retargeting and Physically Plausible Trajectory Recovery

MHR has been leveraged in monocular human motion retargeting pipelines, notably as an intermediate bridge from perception output to humanoid robot control (Tu et al., 25 Dec 2025). Motion captured via visual backbones (e.g., SAM 3D Body) is encoded in a low-dimensional MHR latent stack:

zt=[Tt;Rt;ztmodel;ztexpr]z_t = [T_t; R_t; z_t^\mathrm{model}; z_t^\mathrm{expr}]

Identity (βshape\beta_\mathrm{shape}) and skeleton scale (γscale\gamma_\mathrm{scale}) are averaged over the sequence and locked, enforcing bone-length and anthropometric consistency. Per-frame pose and expression latents are refined using a sliding-window optimization that penalizes deviation from initial estimates, temporal jitter (using finite joint differences in position, velocity, rotation, and acceleration with joint-dependent smoothness weights), and maintains boundary coherence.

Soft, differentiable foot-ground contact probabilities are computed per foot as a function of foot height, incorporated into a global optimization (Adam) that solves for physically plausible root trajectories in a fixed Z-up world frame. The loss penalizes foot sliding, penetration, unexpected height under contact, and encourages temporal root smoothness with camera motion priors.

For retargeting, a two-stage inverse kinematics (IK) solver establishes a correspondence between 14 anatomically paired MHR and robot joints, aligns rotation matrices including gravity alignment and axis conventions, then applies height-normalized scaling. Stage 1 is an end-effector IK (damped-least-squares Jacobian), and Stage 2 refines intermediates, respecting robot joint limits. All operations maintain end-to-end differentiability (Tu et al., 25 Dec 2025).

6. Empirical Evaluation and Comparative Results

On the 3DBodyTex dataset (200 high-resolution scans, two poses each), MHR surpasses existing parametric models. Average vertex-to-surface error (mm) with increasing number of shape components is detailed below:

Model 2 comps 4 comps 8 comps 16 comps
SMPL 4.46 4.43 4.39 4.32
SMPL-X 4.76 4.71 4.65 4.55
MHR 4.76 4.53 4.13 4.11

MHR achieves lower reconstruction error than SMPL-X at all component counts and surpasses SMPL beyond four components, with greatest improvements at joints with complex articulation (knees, elbows, shoulders). Qualitatively, MHR exhibits more anatomically plausible soft-tissue bulges and resolves twisting artifacts without the "candy-wrap" effect (Ferguson et al., 19 Nov 2025).

7. Limitations and Future Directions

MHR currently omits explicit modeling of eyeball geometry, teeth, and tongue, and pose correctives/expressions are not yet conditioned on body shape. Planned extensions include: shape-conditioned corrective priors, integration of soft-tissue and clothing simulation, real-time optimization for mobile AR/VR platforms, and support for stylized character deformation. Conditioning correctives on body shape is anticipated to enable more personalized surface deformation. Eyeball and detailed oral articulation are also identified as priorities for future model releases (Ferguson et al., 19 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Momentum HumanRig (MHR).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube