Deformable Gaussian Splatting

Updated 11 November 2025

Deformable Gaussian Splatting is a point-based 3D representation where dynamic Gaussians capture scene deformations and complex motions.
It employs diverse deformation strategies—such as basis function time-warping, mesh-driven transforms, and anchor-based corrections—to achieve robust temporal modeling.
Applications in VR, surgical reconstruction, and filmmaking benefit from its capability to deliver high-quality, real-time rendering and intuitive editable 3D content.

Deformable Gaussian Splatting (GS) encompasses a family of explicit, point-based scene representations wherein 3D Gaussians act as spatial kernels with analytic support, and their parameters are dynamically modulated to express deformation, motion, or topological change. Deformable GS aims to preserve the rendering efficiency and fidelity of static splatting, while enabling high-quality dynamic reconstruction, mesh-driven editing, and temporally-aware modeling in applications ranging from virtual reality to surgical scene reconstruction.

1. Mathematical Foundations and Static Representation

At its core, standard 3D Gaussian Splatting parameterizes a scene as a set

$G = \{ (\mu_i, \Sigma_i, \sigma_i, c_i) \}_{i=1}^N$

where $\mu_i \in \mathbb{R}^3$ is the kernel center, $\Sigma_i \in \mathbb{R}^{3 \times 3}$ is the (generally anisotropic) covariance, $\sigma_i$ controls opacity, and $c_i$ encodes color, often via spherical harmonics. The 3D density function for each splat is

$G_i(x) = \exp\!\left( -\tfrac12 (x - \mu_i)^\top \Sigma_i^{-1} (x - \mu_i) \right)$

For rendering, each Gaussian is projected to the image plane according to camera pose and projection Jacobian, giving a 2D Gaussian whose footprint—along with depth sorting—drives alpha compositing:

$C(p) = \sum_{i=1}^N c_i\, \alpha_i(p) \prod_{j < i} (1 - \alpha_j(p))$

with $\alpha_i(p)=\sigma_i\,G_i'(p)$ , where $G_i'(p)$ is the projected density at pixel $p$ .

This explicit, differentiable construction supports backpropagation of image reconstruction losses to all Gaussian parameters, enabling gradient-based optimization of scene geometry and appearance.

2. Deformation Parameterizations

Deformable GS generalizes the classic GS paradigm by allowing each Gaussian’s spatial and/or appearance parameters to vary over time, animation phase, or according to an editable control structure. Several principal approaches to parameterizing and learning these deformations are found in the literature:

Basis Function Time-Warping: Each Gaussian’s parameters (position, rotation, scale, sometimes opacity) are expressed as linear combinations of $B$ learnable temporal basis functions. For example, in EH-SurGS, motion is modeled as:

$x(t) = x(0) + \sum_{j=1}^B \omega_j^x\, b_j(t)$

for $x \in \{ \mu, r, s \}$ , where $b_j(t)$ are Gaussian basis functions with learnable centers and variances (Shan et al., 2 Jan 2025).

Mesh-Driven Deformation: Gaussians are strictly bound to mesh structures (usually triangles or other explicit surface elements), so that mesh deformations unambiguously dictate the transformation of the attached splats. In GaMeS, each Gaussian on face $j$ is parameterized as

$\mu_i = \sum_{m=1}^3 \alpha_{i,m} v_{j,m}$

$\Sigma_i(\rho_i; V_j) = \rho_i R_j(V_j)^\top S_j(V_j)^2 R_j(V_j)$

where $\alpha_{i,m}$ are barycentric weights on the triangle (Waczyńska et al., 2 Feb 2024); mesh vertex motion thus induces rigorous affine mapping of the splats.

Anchor-Driven and Coarse-to-Fine Schemes: In ADC-GS, primitives are organized hierarchically: canonical anchors in 3D space perform coarse, rigid-like transformations via MLPs, while residual per-splat corrections are modeled with additional lightweight MLPs (Huang et al., 13 May 2025). Anchor refinement (growing/pruning) matches dynamic capacity to regions with significant motion.
Per-Gaussian Embedding Deformation: Instead of a continuous deformation field, each Gaussian is equipped with its unique, learned latent embedding, and an MLP maps the tuple (per-Gaussian embedding, temporal embedding) to parameter increments. A parallel pair of MLPs can separately model coarse (slow, low-frequency) and fine (rapid, high-frequency) motion (Bae et al., 4 Apr 2024).
Cage-Based Deformations: External low-dimensional "cages" defined as control meshes induce deformations in the GS point set, with each splat mapped via mean-value coordinates or similar interpolants; local Jacobians are used to update individual splat covariances, thereby preserving textural fidelity post-warp (Tong et al., 17 Apr 2025, Xie et al., 19 Nov 2024).
View-Conditioned and Hybrid Fields: For applications sensitive to camera pose (e.g., thermal infrared), deformation fields may be conditioned not just on time, but on view direction, position, or even semantics (Nam et al., 25 May 2025).

The choice of deformation parameterization is driven by requirements for editability, scalability, modeling stochastic or irreversible changes (such as topological tearing), and domain-specific constraints.

3. Mesh- and Cage-Based Editable Representations

Mesh-based Gaussian Splatting methods (e.g., GaMeS (Waczyńska et al., 2 Feb 2024); (Gao et al., 7 Feb 2024)) introduce explicit topological structure by associating each Gaussian with a mesh element (usually a triangular face). This enables several critical capabilities:

Topology-Aware Editing: Deforming mesh vertices propagates affine updates to all attached Gaussians, preserving local connectivity and preventing artifacts (e.g., "holes" or drift) under large, nonrigid deformations.
Adaptive Refinement: Mesh subdivision (e.g., face splitting) is tightly coupled with Gaussian splitting, enabling controlled densification in high-curvature regions while maintaining uniform coverage.
Regularization: Mesh regularity acts as a prior on splat quality, suppressing degenerate or misaligned ellipsoids and promoting physically plausible reconstructions.
Real-Time Animation: Since barycentric or mean-value coordinates are used in parameterizations, mesh animation induces immediate updates to all relevant splat parameters, with minimal computational overhead.

Cage-based systems amplify these features by using functional maps (e.g., harmonic coordinates) defined on low-resolution surface cages, often regularized via Neural Jacobian Fields to preserve local detail and semantic plausibility under extreme deformations (Tong et al., 17 Apr 2025, Xie et al., 19 Nov 2024). This framework further facilitates intuitive editing (e.g., sketch-guided deformation) and alignment to abstract target shapes (e.g., via image, point cloud, or text-derived cues).

4. Dynamic Scene Modeling and Temporal Deformations

For dynamic reconstruction, especially in scenarios with severe nonrigidity or intermittent topology (as in surgical scenes), deformable GS incorporates temporal modeling at the kernel level:

Temporal Basis and Life Cycle Modeling: EH-SurGS augments each Gaussian with a learnable temporal existence function, allowing the system to "phase out" splats during irreversible changes such as tissue cutting (Shan et al., 2 Jan 2025).
Hierarchical Adaptive Motion Processing: By segmenting the scene or image into hierarchical blocks (static vs. deformable), only the necessary subset of Gaussians are dynamically modulated per-frame, reducing computational cost and increasing rendering speed by 10–15% (Shan et al., 2 Jan 2025).
Combination of Static and Deformable Fields: ForestSplats (Park et al., 8 Mar 2025) maintains a static splatting field for the background/static regions and a deformable, per-view field for transient scene elements, dynamically blending them using learned superpixel-aware masks.

These mechanisms enable fine-grained representation of both reversible (e.g., periodic movement) and irreversible (e.g., structural resection) changes.

5. Training Objectives, Regularization, and Optimization

The principal learning targets for deformable GS are photometric reconstruction, geometric consistency, and deformation stability. Common components include:

Photometric or $\ell_1/\ell_2$ Losses: Minimize color differences between rendered and observed images, possibly weighted by visibility, occlusion, or importance masks (Shan et al., 2 Jan 2025, Bae et al., 4 Apr 2024, Zhu et al., 21 Jan 2024).
Depth and Geometry Constraints: Depth supervision via stereo/monocular estimation or SDF/normal regularizers aligns Gaussian density distributions to 3D surface data (Zhu et al., 21 Jan 2024).
Deformation and Smoothness Priors: Temporal motion and neighborhood smoothness losses resist implausible motion and foster locally coherent deformations (Lu et al., 9 Apr 2024).
Mask- or Motion-Aware Adaptive Densification: In dual- or hierarchical-field settings, densification is selectively applied (e.g., uncertainty-aware splitting avoids occluded regions (Park et al., 8 Mar 2025)).

Optimization proceeds via stochastic gradient descent (commonly Adam), with possible two-stage schedules: static GS initialization followed by end-to-end deformable joint optimization. Datasets are typically multi-view or monocular videos with auxiliary (possibly sparse) mask or depth supervision as dictated by the domain.

6. Applications, Empirical Results, and Limitations

Deformable GS methods are applied to a spectrum of practical domains:

Editable 3D Content and Animation: Mesh- and cage-based GS support real-time, artifact-free animation and direct user-driven editing of point-based 3D representations, with performance comparable to (or exceeding) static GS in PSNR/SSIM and attaining real-time frame rates ( $\sim 30$ –$160$ FPS depending on scene complexity) (Waczyńska et al., 2 Feb 2024, Gao et al., 7 Feb 2024, Tong et al., 17 Apr 2025).
Dynamic Scene and Surgical Reconstruction: Advanced methods (Deform3DGS, EH-SurGS, EndoGS) achieve high-fidelity ( $\mathrm{PSNR}\sim40$ dB), high-speed ($300$+ $\ \mathrm{FPS}$ ), and sub-2-minute training times for complex surgical scenes with strong topological changes (Shan et al., 2 Jan 2025, Yang et al., 28 May 2024, Zhu et al., 21 Jan 2024).
Sparse-View Filmmaking: Methods such as Splatography decouple foreground and background deformation, allowing robust reconstruction from sparse multi-view video with sharp segmentation and half the model footprint of conventional 4D-GS (Azzarelli et al., 7 Nov 2025).
Thermal/Multimodal Novel-View Synthesis: Veta-GS combines view-dependent fields and tailored "thermal feature extractors" to outperform prior TIR synthesis models by up to 1 dB PSNR, while maintaining real-time rendering (Nam et al., 25 May 2025).

Limitations persist, notably the need for accurate mesh or pose priors, the challenge of fully modeling topological change (e.g., explicit birth/death of splats, tearing beyond simple deactivation), and the computational intensity of densification or fine-grained deformation at very large scales. Some hybrid methods trade off memory or speed for fidelity; others are domain-specific (e.g., surgical, VR, TIR imaging) (Gao et al., 7 Feb 2024, Yang et al., 28 May 2024).

7. Algorithmic Summary and Contemporary Directions

The canonical deformable GS algorithmic loop comprises:

Initialization: Static GS fitting or mesh/cage construction; possible anchor/embedding/or mask assignment.
Per-Frame Update: For each frame (or view/time), update splat parameters (via basis/MLP/mesh warp), mask assignment, and field segmentation as appropriate.
Rendering: Project and rasterize each Gaussian with updated parameters, compositing as per splatting equations.
Optimization: Compute joint losses, backpropagate to all relevant parameters (splats, mesh, network weights), and apply adaptive densification/growing/pruning.
Specialized Steps: For applications—physics simulation (VR-GS), superpixel-driven masking (ForestSplats), user-guided deformation (sketch/cage-based GS).

Empirical ablations consistently demonstrate that the inclusion of mesh/cage constraints, temporal basis modeling (with explicit on/off lifecycle), and hierarchical or anchor-driven parameterizations yield measurable improvements in speed, fidelity, and editability (Shan et al., 2 Jan 2025, Huang et al., 13 May 2025, Tong et al., 17 Apr 2025).

Current research foregrounds the value of explicit topological scaffolds, hierarchical motion modeling, dynamic mask management, memory-efficient transient field design, and domain-adaptive feature extraction. Future research aims include explicit handling of topological birth/death, more expressive learning of deformation priors (e.g., via neural Jacobian fields or learned motion bases), and broader integration with non-photorealistic modalities and interactive editing pipelines.