Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic 4D Gaussian Splatting

Updated 13 March 2026
  • The paper introduces explicit 4D Gaussian primitives combined with temporal deformation models to achieve real-time, high-fidelity rendering of dynamic scenes.
  • It leverages mathematical parameterizations, optimal-transport regularization, and state-space modeling to ensure temporal consistency and spatial accuracy.
  • Practical results demonstrate enhanced novel view and time synthesis, efficient compression, and integration with SLAM for dynamic scene reconstruction.

Dynamic 4D Gaussian Splatting

Dynamic 4D Gaussian Splatting (4DGS) is a family of methods for explicit, real-time representation and differentiable rendering of dynamic scenes as collections of time-evolving anisotropic Gaussian primitives in four-dimensional (spatio-temporal) space. These approaches unify spatial geometry and temporal evolution within a coherent framework, enabling novel view and novel time synthesis with high efficiency and fidelity. The primary models leverage either native 4D Gaussian parameterizations or sophisticated temporal deformation fields, often enhanced by regularization, geometric priors, or a mixture of explicit and neural architectures. This article reviews foundational and state-of-the-art techniques, organizing their advances via representation, deformation modeling, optimization, regularization, and practical implications.

1. Mathematical Foundations and 4D Gaussian Representation

The core of 4D Gaussian Splatting is the explicit parameterization of a dynamic scene as a collection of NN 4D Gaussian primitives. Each primitive ii consists of

  • A spatio-temporal center μi=(μx,i,μy,i,μz,i,μt,i)R4\boldsymbol\mu_i = (\mu_{x,i}, \mu_{y,i}, \mu_{z,i}, \mu_{t,i})^\top \in \mathbb{R}^4
  • A 4×44\times4 positive-definite covariance matrix Σi=RiSi2Ri\Sigma_i = R_i S_i^2 R_i^\top with anisotropic scales Si=diag(sx,i,sy,i,sz,i,st,i)S_i = \mathrm{diag}(s_{x,i}, s_{y,i}, s_{z,i}, s_{t,i}) and a general RiSO(4)R_i \in SO(4), often parameterized by dual unit quaternions
  • Opacity αi\alpha_i and color (commonly modeled with 4D spherindrical harmonics or memory-efficient forms)
  • View-dependent and temporally adaptive appearance and radiance properties

The unnormalized 4D Gaussian density at spacetime point p=(x,y,z,t)\mathbf{p} = (x, y, z, t)^\top is

Gi(p)=wiexp(12(pμi)Σi1(pμi))G_i(\mathbf{p}) = w_i \exp\left(-\tfrac12 (\mathbf{p} - \boldsymbol\mu_i)^\top \Sigma_i^{-1} (\mathbf{p} - \boldsymbol\mu_i) \right)

where wiw_i is a primitive weight.

To render a 2D image at time tt, the 4D Gaussian is conditionally sliced to a 3D spatial Gaussian using multivariate conditioning: μxyzt=μxyz+Σ1:3,4Σ4,41(tμt),Σxyzt=Σ1:3,1:3Σ1:3,4Σ4,41Σ4,1:3\mu_{xyz|t} = \mu_{xyz} + \Sigma_{1:3,4} \Sigma_{4,4}^{-1} (t - \mu_{t}), \qquad \Sigma_{xyz|t} = \Sigma_{1:3,1:3} - \Sigma_{1:3,4} \Sigma_{4,4}^{-1} \Sigma_{4,1:3} Then, the conditional 3D Gaussian (μxyzt,Σxyzt)(\mu_{xyz|t}, \Sigma_{xyz|t}) is projected through the camera, producing a 2D ellipsoidal “splat” for rasterization, as in (Yang et al., 2024, Yang et al., 2023, Duan et al., 2024).

2. Temporal Deformation, State-Space, and Canonical-to-Dynamic Modeling

A central challenge is capturing physically plausible, temporally consistent Gaussian motion and deformation.

Native 4D Parameterization

Some methods directly optimize all 4D Gaussian parameters end to end over space and time, favoring minimal motion assumptions and maximal flexibility, as in (Yang et al., 2024, Yang et al., 2023). However, this parameter redundancy can incur memory, computational overhead, and overfitting.

Deformation-Driven Approaches

Others initialize canonical 3D or 4D Gaussians (typically from structure-from-motion or COLMAP) and use learned neural networks (MLPs, HexPlanes, or K-Planes) to predict per-Gaussian spatial/shape/appearance deformations as a function of time, view, or both (Wu et al., 2023, Deng et al., 2024, Oh et al., 19 May 2025, Wu et al., 1 Nov 2025). These deformation fields are lightweight, can be hybrid (combining 3D/static for stationary background and 4D/dynamic for foreground motion), and may be regularized using temporal smoothness priors such as Haar wavelet transforms (Lee et al., 23 Jul 2025) or total-variation.

State-Space and Optimal-Transport Regularization

Recent works impose explicit dynamical models: state-space transitions are applied to each Gaussian’s parameters (mean, covariance), using a constant-velocity assumption in the (Gaussian) parameter manifold. State predictions are merged with neural or data-driven observations via a Kalman-like "State Consistency Filter," yielding enhanced temporal coherence and robustness (Deng et al., 2024).

Wasserstein geometry is leveraged for both temporal regularization and alignment: the squared 2-Wasserstein distance between Gaussians,

W22(G1,G2)=μ1μ22+Tr(Σ1+Σ22(Σ11/2Σ2Σ11/2)1/2)W_2^2(G_1, G_2) = \|\mu_1 - \mu_2\|^2 + \mathrm{Tr}\left(\Sigma_1 + \Sigma_2 - 2 (\Sigma_1^{1/2} \Sigma_2 \Sigma_1^{1/2})^{1/2}\right)

serves both as an alignment term between prediction and observation, and as a temporal smoothness prior over consecutive frames, facilitating optimal-transport-guided update trajectories (Deng et al., 2024).

3. Differentiable Splatting Rasterization and Photorealistic Rendering

For rendering, each per-frame 3D Gaussian is projected to the image plane and rasterized as a 2D elliptical kernel. The contributions from all (visible) Gaussians along each camera ray are alpha-blended front-to-back: C(u,v,t)=i=1Nαipi(u,v,t)ci(θi,ϕi,t)j<i(1αjpj(u,v,t))C(u,v,t) = \sum_{i=1}^N \alpha_i\, p_i(u,v,t)\, c_i(\theta_i, \phi_i, t) \prod_{j<i} (1 - \alpha_j\, p_j(u,v,t)) where pi(u,v,t)p_i(u,v,t) is the projected kernel weight, cic_i is the view- and time-dependent color, and αi\alpha_i is opacity.

This splatting process is implemented efficiently in CUDA, often supporting real-time training and inference (tens to thousands of FPS) for high-resolution frames (Duan et al., 2024, Yang et al., 2024). GPU memory and speed are further optimized via frustum culling, compact data layouts (SoA), and depth-ordered tile-based compositing.

Recent work on anti-aliasing and adaptive filtering for 4DGS proposes 4D scale-adaptive filters: to avoid spurious artifacts and redundant micro-Gaussians, the maximum frequency of each Gaussian (derived from camera focal/depth and the Nyquist criterion) governs the minimum allowable kernel support. This, combined with regularization, robustly prevents aliasing across scale and zoom (Chen et al., 23 Nov 2025).

4. Regularization, Priors, and Hybridization

To constrain the large solution space and improve reconstruction fidelity:

  • Temporal Smoothness: Enforced via sequential Wasserstein loss (Deng et al., 2024), wavelet sparsification (Lee et al., 23 Jul 2025), or explicit total variation on deformation fields (Wu et al., 1 Nov 2025, Yu et al., 27 Mar 2025).
  • Geometry Priors: Multi-view stereo (MVS) and monocular depth estimates impose structure and depth regularization, with dynamic consistency checks for robust temporally coherent geometry under sparse input (Li et al., 28 Nov 2025).
  • Static-Dynamic Hybrid Models: Static regions are efficiently assigned time-invariant (3D) Gaussians, while dynamic regions use full 4D representation, leading to major memory and compute savings without loss in visual quality (Oh et al., 19 May 2025, Sun et al., 12 Mar 2025).
  • Memory Efficiency and Compression: Practical frameworks aggressively compress parameters via pruning, quantization, codebook vector quantization, and explicit temporal transforms. For example, MEGA reduces color storage from 144 SH parameters to 3+MLP and achieves up to 190× storage reduction without quality drop (Zhang et al., 2024). Bit-level, entropy-constrained compression with wavelet-coded motion achieves up to 91× reduction (Lee et al., 23 Jul 2025).
  • SLAM and Tracking Integration: Temporal consistency is exploited for robust camera localization and mapping in dynamic environments, with explicit dynamic-vs-static Gaussian splitting and (optionally) optical flow-based supervision (Li et al., 20 Mar 2025, Sun et al., 7 Apr 2025).

5. Applications and Experimental Benchmarks

Dynamic 4D Gaussian Splatting has been applied to a wide variety of tasks:

Benchmark results indicate that methods such as Wasserstein-constrained 4DGS (Deng et al., 2024) and hybrid/static-dynamic approaches (Oh et al., 19 May 2025) capably balance temporal consistency, geometry, and memory demands. Ablation studies consistently show substantial gains from filter-based temporal smoothing and explicit Gaussian splitting.

6. Limitations, Open Challenges, and Directions

Despite major advances, notable limitations remain:

  • Memory and Scalability: Native 4DGS can demand large memory footprints (>1>1 GB for unconstrained representation), though recent methods acclimate via hybridization and quantization (Zhang et al., 2024).
  • Overfitting and Flicker: Under sparse inputs or inadequate temporal regularization, 4DGS can overfit dynamic regions or exhibit spatial/temporal flicker. Imposing priors or hybrid geometries mitigates these effects (Li et al., 28 Nov 2025, Oh et al., 19 May 2025).
  • Sparse Input and Non-Rigid Topologies: Highly non-rigid, topologically evolving scenes or those captured with very sparse frames remain challenging. Texture-aware regularization and self-supervised depth cues alleviate but do not fully resolve ill-posed settings (Shi et al., 10 Nov 2025).
  • Motion Disambiguity: Supervision via flow (e.g., GaussianFlow) and explicit state-space models counteract ambiguity in motion estimation (Gao et al., 2024, Deng et al., 2024).

Open research directions include learnable dynamic-vs-static classification, adaptive Gaussian splitting/merging, further compression optimizations, and tighter integration with geometric/physics simulators and higher-level scene priors. Optimal-transport and Wasserstein-geometry-based modeling are likely to further enhance both theoretical and empirical performance, especially for scenes exhibiting complex, temporally coherent deformation.


References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic 4D Gaussian Splatting.