Motion-Aware Gaussian Splatting (MA-GS)

Updated 10 November 2025

The paper introduces MA-GS, a framework that extends static 3D Gaussian Splatting by incorporating structured part-aware modeling, disentangled rigid transformations, and physics-guided constraints.
It employs independent SE(3) transformations and imposes canonical, contact, velocity, and vector-field losses to ensure realistic articulation and spatial coherence.
MA-GS integrates a repel-point field and unified loss optimization to achieve collision-free, high-fidelity digital twin reconstructions from multi-view inputs.

Motion-Aware Gaussian Splatting (MA-GS) designates a family of frameworks that generalize static 3D Gaussian Splatting (3DGS) to faithfully model and render articulated objects undergoing physically plausible motions. In the context of "Part $^{2}$ GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting" (Yu et al., 20 Jun 2025), MA-GS enables the reconstruction of digital twins of multi-part objects—such as robotic arms, furniture, and machines—by encoding structured part decomposition, disentangled part-wise rigid transformations, a canonical physics-guided representation, collision-avoidance mechanisms, and a unified optimization regime. MA-GS is characterized by explicit part-aware parameterization and incorporation of physical constraints to ensure saliency preservation, contact coherence, and articulation realism.

1. Part-Aware 3D Gaussian Representation

MA-GS models an articulated object as one static base plus $K$ rigid moving parts. Each part $\mathcal{G}_k$ consists of $M_k$ anisotropic Gaussians $\{G^k_i\}_{i=1}^{M_k}$ , where every Gaussian $G_i$ is parameterized as

$G_i = \left(\mu_i \in \mathbb{R}^3,\, r_i \in \mathbb{R}^4,\, s_i \in \mathbb{R}^3,\, \sigma_i \in [0,1],\, h_i,\, \psi_i \in \mathbb{R}^d\right),$

with $\mu_i$ as center, $r_i$ as rotation quaternion (from which covariance is formed: $\Sigma_i = R_i S_i^2 R_i^T$ ), $s_i$ as anisotropic scale, $\sigma_i$ as opacity, $h_i$ as view-dependent color coefficients (spherical harmonics), and $\psi_i$ as a learned part-identity embedding. The inclusion of $\psi_i$ enables both soft and hard assignment of Gaussians to articulated substructures.

This representation ensures that each part's geometry and appearance can be disentangled, parameterized, and manipulated independently, allowing explicit structuring of articulated kinematics.

2. Structured and Disentangled Articulation

Within MA-GS, parts are assigned independent SE(3) rigid transformations $T_k = (R_k \in \mathrm{SO}(3),\, t_k \in \mathbb{R}^3)$ , so that every Gaussian $G^k_i$ in part $k$ moves by

$\tilde\mu^k_i = R_k\, \mu^k_i + t_k,$

while maintaining local anisotropic shape and color explcitly. The rest of each Gaussian's attributes (covariance $R_i$ , scale $s_i$ , SH color $h_i$ , and embedding $\psi_i$ ) transform strictly rigidly, preserving sub-part detail during arbitrary part articulation.

This disentangled structure guarantees that part-wise motions are not mixed, enabling joint optimization and robust supervision of multiple articulated degrees-of-freedom. Coupled with part-identity embeddings, this design accommodates both soft and hard kinematic hierarchies within a unified splatting framework.

3. Physics-Guided Canonical Prior and Constraints

To ensure that learnt part articulation reflects actual physical constraints, MA-GS interpolates a canonical configuration between two observed states via

$\mu^c_i = (1-\beta) \mu^0_i + \beta \mu^1_i,$

where $\beta \in [0,1]$ favors the visually or structurally motion-rich state. Over the canonical set, three physical losses are imposed:

Contact enforcement: To prevent physically implausible interpenetration of parts or loss of contact,

$\mathcal{L}_{\rm contact} = \frac{1}{|\mathcal{G}_k|}\sum_{i\in\mathcal{G}_k} \max(0, -\cos\varphi_i),$

where $\cos\varphi_i$ measures deviation between part-base and centroid directions.

Velocity consistency: To enforce rigidity within each part,

$\mathcal{L}_{\rm velocity} = \sum_{k=1}^K \mathrm{Var}\{\Delta\mu_i : i \in \mathcal{G}_k\},$

penalizing intra-part motion non-uniformity.

Vector-field alignment: To align per-part rigid transforms with observed centroid motion,

$\mathcal{L}_{\rm vector} = \sum_{k=1}^K \sum_{i\in\mathcal{G}_k} \|R_k\mu^0_i + t_k - \mu^1_i\|^2.$

These constraints regularize the geometry and enforce physically plausible, contact-preserving, non-colliding, and symmetric articulation paths.

4. Repel-Point Field for Collision Avoidance

A field of $N_R$ repel points $\{r_j\}$ is introduced in regions where movable parts approach the static base. Each such point exerts an inverse-square force on each Gaussian in the movable part according to

$F^k_{{\rm repel}, i} = \sum_{j=1}^{N_R} k_r \frac{r_j - \mu^k_i}{\|r_j - \mu^k_i\|^3},$

where $k_r$ is a repulsion coefficient. During forward articulation, after applying the part transform $T_k$ , each Gaussian is displaced by its repel force in screen space. This mechanism dynamically prevents collisions and maintains temporal articulation path stability during optimization, significantly improving motion coherence relative to unconstrained baselines.

5. Unified Loss Formulation and Optimization

The total construction stage loss aggregates geometry reconstruction, part clustering, and physical consistency: $\mathcal{L}_{\rm construct} = \mathcal{L}_{\rm render} + \lambda_{\rm part}\,\mathcal{L}_{\rm part} + \lambda_{\rm phys}\,\mathcal{L}_{\rm phys},$ where

$\mathcal{L}_{\rm render}$ is a combination of $L_1$ and DSSIM loss between rendered and target images,
$\mathcal{L}_{\rm part}$ is a KL divergence-based part clustering regularizer informed by each Gaussian's part-embedding distribution,
$\mathcal{L}_{\rm phys}$ is the sum of contact, velocity, and vector-field-based physics constraints.

During the articulation phase, an additional articulation loss penalizes deviation from target transformed positions under repulsion, including a rotation consistency term. Optimization proceeds alternately over Gaussian attributes, per-part embeddings, and transforms. Differentiable rendering ensures all parameters are updated by gradient descent.

6. Training and Inference Pipeline

The canonical MA-GS pipeline proceeds as follows:

Input multi-view images at two articulation states.
Independently fit two single-state sets of Gaussians by minimizing $\mathcal{L}_{\rm render}$ (coarse 3DGS).
Match and canonicalize: use Hungarian matching to align centers and then interpolate to form the canonical $\mathcal{G}^c$ .
Initialize part-ID embeddings and cluster centers, then optimize over these in the canonical stage.
Minimize the overall loss function for physical and geometric fidelity.
Deploy repel points dynamically via sampling in tight proximity regions.
For each articulation, initialize part transforms randomly, apply rigid transformation and repel adjustment, then minimize articulation and contact losses.
Output the final optimized Gaussian set with associated per-part rigid transforms.

This procedure yields a part-aware, physically and geometrically consistent digital twin capable of high-fidelity, collision-free, and temporally coherent motion simulation. The framework is applicable to both synthetic and real-world multi-part datasets and demonstrates empirical superiority in geometric and articulation reconstruction metrics, notably achieving up to $10\times$ improvement in Chamfer Distance for movable parts relative to non-structured baselines (Yu et al., 20 Jun 2025).

PDF Markdown Chat (Pro)

References (1)

Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Motion-Aware Gaussian Splatting (MA-GS).