Momentum Human Rig (MHR) Overview

Updated 2 February 2026

Momentum Human Rig (MHR) is a parametric human model defined by decoupled skeleton and shape parameters, enabling anatomically plausible animation and expression control.
It employs a deep kinematic hierarchy with non-linear pose corrective systems and linear blend skinning to deliver real-time, efficient AR/VR and graphics performance.
Integrated in vision pipelines like SAM 3D Body, MHR supports applications in animation, robotics, and retargeting while managing trade-offs in detailed anatomical expressivity.

The Momentum Human Rig (MHR) is a parametric human body model designed to facilitate expressive, anatomically plausible animation and robust integration into AR/VR and computer graphics pipelines. MHR combines an explicitly decoupled skeleton and shape paradigm—originating from ATLAS—with a hierarchical rigging system and a sparse, non-linear pose corrective scheme inspired by the Momentum library. It is employed both as a standalone rigging framework and as the generative geometry backbone in major vision pipelines such as SAM 3D Body, providing a low-dimensional and differentiable interface for both shape and motion inference (Ferguson et al., 19 Nov 2025, Aiersilan et al., 2 Dec 2025, Tu et al., 25 Dec 2025).

1. Decoupled Skeleton and Shape Parameterization

MHR encodes human geometry via separate, low-dimensional latent spaces for shape, skeleton proportions, facial expression, and pose. The parameterization follows:

Identity Shape Coefficients ( $\beta^s \in \mathbb{R}^{N_s}$ ): 45 dimensions (20 body, 20 head, 5 hand) model subject-specific, soft-tissue and identity deformations. These are derived from separate non-rigid registrations of body, head, and hand scans with concatenated PCA components.
Facial Expression Coefficients ( $\beta^f \in \mathbb{R}^{N_f}$ ): 72 artist-sculpted, semantic blendshape weights following FACS conventions control facial expressions independent of underlying identity.
Skeleton Transform Coefficients ( $\beta^k \in \mathbb{R}^{N_k}$ or $\gamma$ ): 68–70 parameters encode bone lengths and inter-joint scaling, held constant per subject to ensure anthropometric consistency over time.
Pose Parameters ( $\theta \in \mathbb{R}^{n_p}$ ): $n_p \approx 204$ , comprising 136 joint articulation DoFs and (sometimes redundant) 68 skeleton transform parameters.

The neutral, unposed mesh is constructed as

$\tilde{X}(\beta^s, \beta^f, \theta) = \bar{X} + B^s(\beta^s) + B^f(\beta^f) + B^p(\theta)$

where $B^s$ and $B^f$ are linear blendshape expansions, and $B^p$ is the non-linear, pose-corrective displacement field (Ferguson et al., 19 Nov 2025).

2. Rigging Framework Architecture

The MHR skeleton is a deep kinematic hierarchy with $n_j=127$ joints. Each joint $j$ is parameterized by:

Constant offset $T_\mathrm{off,j}$ ,
Pre-rotation $T_\mathrm{prerot,j}$ to align local axes (ensuring twist around x-axis),
Transform parameters: translation $T_{t,j}$ , rotation $T_{\text{rot},j}$ (Euler, continuous 6D), isotropic scale $T_{s,j}$ .

Per-joint local-to-world transforms are recursively accumulated,

$T_{w,j} = T_{w,\text{parent}(j)} \, T_{\text{off},j} \, T_{t,j} \, T_{\text{prerot},j} \, T_{\text{rot},j} \, T_{s,j}$

The full parameter vector $\Theta_j$ (of size $7n_j$ ) is derived via a global linear mapping from the reduced pose/scale latent space: $\Theta_j = L \Theta_p$ , facilitating subspace sharing and selective activation of joint DoFs.

Skinning is performed using classical Linear Blend Skinning (LBS):

$V_i = \sum_{j=1}^{n_j} \omega_{i,j} T_{w,j}[\,\tilde{X}_i\,]$

where $\omega_{i,j}$ are artist-defined weights (typically 4–8 nonzero entries per vertex), allowing for efficient multi-resolution support (Ferguson et al., 19 Nov 2025).

3. Non-Linear Pose Corrective System

MHR implements a joint-wise, sparse pose-corrective system combining local MLPs (one per joint) with spatial masks:

Input: For each joint $j$ , collect the 6D rotation deviations of $j$ and its immediate neighbors $n(j)$ :

$\Delta R_a = R_{6d}(\theta_a) - R_{6d}(0), \quad \forall a \in n(j)$

Local MLP: Shared-weight MLP $h_j = \text{MLP}_j(\{\Delta R_a\})$ produces a compact code.
Linear Projection: Map to vertex space: $\delta X_j = P_j h_j \in \mathbb{R}^{3V}$ .
Sparsity Constraint: Enforce spatial localization via a ReLU-masked mask $\phi(A_j)$ , initialized using normalized geodesic distance to the joint's segment. The final per-joint correction:

$B^p_j(\theta) = \phi(A_j) \odot \delta X_j$

Summing these across all joints provides the full pose-corrective blendshape:

$B^p(\theta) = \sum_{j=1}^J B^p_j(\theta)$

L1-regularization of $\phi(A_j)$ maintains spatial compactness and interpretability of correctives (Ferguson et al., 19 Nov 2025).

4. Deployment and Integration in Vision and Graphics Pipelines

MHR is implemented in the Momentum C++/Python library with PyTorch bindings, supporting:

Parameter Input: Externally or via learned regressors $(\beta^s, \beta^f, \beta^k, \theta)$ .
Transformation: Hierarchical joint transforms via $L \Theta_p$ and FK.
Blendshape Assembly and Skinning: Construction of $\tilde{X}$ , addition of pose-correctives, and application of LBS.
Export: Output to FBX/GLTF (for graphics) or direct GPU rendering streams.

Six levels-of-detail (LoD) are maintained—from $\sim$ 74,000 down to 595 vertices. Blendshapes are always computed at LoD1 and mapped to coarser/finer meshes via barycentric interpolation or subdivision.

Real-time performance (over 60 Hz) is achieved via the restricted number of influencing joints per LoD and efficient per-joint MLPs (Ferguson et al., 19 Nov 2025). MHR serves as the human geometry backbone in SAM 3D Body (Aiersilan et al., 2 Dec 2025) and drives geometry inference from monocular images, with smoothness and semantic robustness enhanced via DINOv3 vision transformer conditioning and annotation-based alignment.

5. Applications in Animation, AR/VR, and Robotics

MHR's decoupled parameter space and precise skeleton definition enable several downstream applications:

Robust Animation and Retargeting: The anatomical consistency and physically plausible articulation allows for pose transfer and animation in AR/VR environments (Ferguson et al., 19 Nov 2025).
Monocular Image-to-Mesh Pipelines: In systems like SAM 3D Body, images are mapped to MHR parameters via DINOv3 embeddings and regression networks, producing watertight, topologically consistent meshes for perception tasks (Aiersilan et al., 2 Dec 2025).
Motion Retargeting: For robotics, MHR acts as an intermediate between perception outputs and robot motion. A low-dimensional latent vector encodes per-frame model, expression, global root, and pose. Latent-space trajectories are optimized for temporal consistency, with root motion refined using soft contact and foot-ground penalties, then retargeted via two-stage inverse kinematics to e.g., the Unitree G1 humanoid (Tu et al., 25 Dec 2025).

6. Limitations and Anthropometric Coverage

Despite its robustness and efficiency, MHR's low-dimensional shape space (20 core-body dims) intrinsically limits expressivity for non-standard anatomical detail. SAM 3D Body evaluations highlight systematic smoothing (or "regression to the mean") for outlier cases such as pregnancy, geriatric atrophy, or skeletal deformities. The smoothness is further enforced by annotation-based regularization favouring topology-preserving, high-frequency-suppressed surfaces (Aiersilan et al., 2 Dec 2025).

The reliance on semantic-invariant vision backbones (e.g., DINOv3) for parameter regression compounds the limitation, rendering MHR particularly suitable for perceptual robustness but insufficient for detailed medical modeling without architectural extension. Anatomically critical, localized deviations remain outside the span of fitted blendshape bases $B_s$ and are regularly treated as outliers during model-in-the-loop annotation phases.

7. Implementation and Reproducibility Considerations

Rotation Representation: Pose is encoded in continuous 6D rotation to avoid gimbal lock and ensure stable interpolation.
Pre-Rotation Alignment: Each joint’s pre-rotation orients the local x-axis to the bone direction, enabling intuitive twist behavior.
Global Parameter Sharing: A single linear mapping from model parameters to joint transformations supports flexible skeleton subspaces and overlapping or supplemental joints.
Geodesic Initialization: Spatial masks for pose corrective blendshapes are initialized via geodesic distance to enforce natural joint-centric deformation.
Scan Registration: Body parts are registered independently for shape basis calculation, concatenated post-registration.
Handcrafted Expression Basis: Facial blendshapes are hand-sculpted following FACS, minimizing unwanted pose-expression entanglement.
Training Resolution: All model components are trained at mid-resolution (LoD1), with mapping to other resolutions for runtime flexibility (Ferguson et al., 19 Nov 2025).

These system designs ensure MHR remains a scalable, differentiable, and artist-controllable human rigging solution, bridging state-of-the-art perception, animation, and robotic retargeting.

Markdown Report Issue Upgrade to Chat

References (3)

MHR: Momentum Human Rig (2025)

Investigating Anthropometric Fidelity in SAM 3D Body (2025)

World-Coordinate Human Motion Retargeting via SAM 3D Body (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Momentum Human Rig (MHR).