Dynamic 4D Gaussian Splatting

Updated 13 March 2026

The paper introduces explicit 4D Gaussian primitives combined with temporal deformation models to achieve real-time, high-fidelity rendering of dynamic scenes.
It leverages mathematical parameterizations, optimal-transport regularization, and state-space modeling to ensure temporal consistency and spatial accuracy.
Practical results demonstrate enhanced novel view and time synthesis, efficient compression, and integration with SLAM for dynamic scene reconstruction.

Dynamic 4D Gaussian Splatting (4DGS) is a family of methods for explicit, real-time representation and differentiable rendering of dynamic scenes as collections of time-evolving anisotropic Gaussian primitives in four-dimensional (spatio-temporal) space. These approaches unify spatial geometry and temporal evolution within a coherent framework, enabling novel view and novel time synthesis with high efficiency and fidelity. The primary models leverage either native 4D Gaussian parameterizations or sophisticated temporal deformation fields, often enhanced by regularization, geometric priors, or a mixture of explicit and neural architectures. This article reviews foundational and state-of-the-art techniques, organizing their advances via representation, deformation modeling, optimization, regularization, and practical implications.

1. Mathematical Foundations and 4D Gaussian Representation

The core of 4D Gaussian Splatting is the explicit parameterization of a dynamic scene as a collection of $N$ 4D Gaussian primitives. Each primitive $i$ consists of

A spatio-temporal center $\boldsymbol\mu_i = (\mu_{x,i}, \mu_{y,i}, \mu_{z,i}, \mu_{t,i})^\top \in \mathbb{R}^4$
A $4\times4$ positive-definite covariance matrix $\Sigma_i = R_i S_i^2 R_i^\top$ with anisotropic scales $S_i = \mathrm{diag}(s_{x,i}, s_{y,i}, s_{z,i}, s_{t,i})$ and a general $R_i \in SO(4)$ , often parameterized by dual unit quaternions
Opacity $\alpha_i$ and color (commonly modeled with 4D spherindrical harmonics or memory-efficient forms)
View-dependent and temporally adaptive appearance and radiance properties

The unnormalized 4D Gaussian density at spacetime point $\mathbf{p} = (x, y, z, t)^\top$ is

$G_i(\mathbf{p}) = w_i \exp\left(-\tfrac12 (\mathbf{p} - \boldsymbol\mu_i)^\top \Sigma_i^{-1} (\mathbf{p} - \boldsymbol\mu_i) \right)$

where $w_i$ is a primitive weight.

To render a 2D image at time $t$ , the 4D Gaussian is conditionally sliced to a 3D spatial Gaussian using multivariate conditioning: $\mu_{xyz|t} = \mu_{xyz} + \Sigma_{1:3,4} \Sigma_{4,4}^{-1} (t - \mu_{t}), \qquad \Sigma_{xyz|t} = \Sigma_{1:3,1:3} - \Sigma_{1:3,4} \Sigma_{4,4}^{-1} \Sigma_{4,1:3}$ Then, the conditional 3D Gaussian $(\mu_{xyz|t}, \Sigma_{xyz|t})$ is projected through the camera, producing a 2D ellipsoidal “splat” for rasterization, as in (Yang et al., 2024, Yang et al., 2023, Duan et al., 2024).

2. Temporal Deformation, State-Space, and Canonical-to-Dynamic Modeling

A central challenge is capturing physically plausible, temporally consistent Gaussian motion and deformation.

Native 4D Parameterization

Some methods directly optimize all 4D Gaussian parameters end to end over space and time, favoring minimal motion assumptions and maximal flexibility, as in (Yang et al., 2024, Yang et al., 2023). However, this parameter redundancy can incur memory, computational overhead, and overfitting.

Deformation-Driven Approaches

Others initialize canonical 3D or 4D Gaussians (typically from structure-from-motion or COLMAP) and use learned neural networks (MLPs, HexPlanes, or K-Planes) to predict per-Gaussian spatial/shape/appearance deformations as a function of time, view, or both (Wu et al., 2023, Deng et al., 2024, Oh et al., 19 May 2025, Wu et al., 1 Nov 2025). These deformation fields are lightweight, can be hybrid (combining 3D/static for stationary background and 4D/dynamic for foreground motion), and may be regularized using temporal smoothness priors such as Haar wavelet transforms (Lee et al., 23 Jul 2025) or total-variation.

State-Space and Optimal-Transport Regularization

Recent works impose explicit dynamical models: state-space transitions are applied to each Gaussian’s parameters (mean, covariance), using a constant-velocity assumption in the (Gaussian) parameter manifold. State predictions are merged with neural or data-driven observations via a Kalman-like "State Consistency Filter," yielding enhanced temporal coherence and robustness (Deng et al., 2024).

Wasserstein geometry is leveraged for both temporal regularization and alignment: the squared 2-Wasserstein distance between Gaussians,

$W_2^2(G_1, G_2) = \|\mu_1 - \mu_2\|^2 + \mathrm{Tr}\left(\Sigma_1 + \Sigma_2 - 2 (\Sigma_1^{1/2} \Sigma_2 \Sigma_1^{1/2})^{1/2}\right)$

serves both as an alignment term between prediction and observation, and as a temporal smoothness prior over consecutive frames, facilitating optimal-transport-guided update trajectories (Deng et al., 2024).

3. Differentiable Splatting Rasterization and Photorealistic Rendering

For rendering, each per-frame 3D Gaussian is projected to the image plane and rasterized as a 2D elliptical kernel. The contributions from all (visible) Gaussians along each camera ray are alpha-blended front-to-back: $C(u,v,t) = \sum_{i=1}^N \alpha_i\, p_i(u,v,t)\, c_i(\theta_i, \phi_i, t) \prod_{j<i} (1 - \alpha_j\, p_j(u,v,t))$ where $p_i(u,v,t)$ is the projected kernel weight, $c_i$ is the view- and time-dependent color, and $\alpha_i$ is opacity.

This splatting process is implemented efficiently in CUDA, often supporting real-time training and inference (tens to thousands of FPS) for high-resolution frames (Duan et al., 2024, Yang et al., 2024). GPU memory and speed are further optimized via frustum culling, compact data layouts (SoA), and depth-ordered tile-based compositing.

Recent work on anti-aliasing and adaptive filtering for 4DGS proposes 4D scale-adaptive filters: to avoid spurious artifacts and redundant micro-Gaussians, the maximum frequency of each Gaussian (derived from camera focal/depth and the Nyquist criterion) governs the minimum allowable kernel support. This, combined with regularization, robustly prevents aliasing across scale and zoom (Chen et al., 23 Nov 2025).

4. Regularization, Priors, and Hybridization

To constrain the large solution space and improve reconstruction fidelity:

Temporal Smoothness: Enforced via sequential Wasserstein loss (Deng et al., 2024), wavelet sparsification (Lee et al., 23 Jul 2025), or explicit total variation on deformation fields (Wu et al., 1 Nov 2025, Yu et al., 27 Mar 2025).
Geometry Priors: Multi-view stereo (MVS) and monocular depth estimates impose structure and depth regularization, with dynamic consistency checks for robust temporally coherent geometry under sparse input (Li et al., 28 Nov 2025).
Static-Dynamic Hybrid Models: Static regions are efficiently assigned time-invariant (3D) Gaussians, while dynamic regions use full 4D representation, leading to major memory and compute savings without loss in visual quality (Oh et al., 19 May 2025, Sun et al., 12 Mar 2025).
Memory Efficiency and Compression: Practical frameworks aggressively compress parameters via pruning, quantization, codebook vector quantization, and explicit temporal transforms. For example, MEGA reduces color storage from 144 SH parameters to 3+MLP and achieves up to 190× storage reduction without quality drop (Zhang et al., 2024). Bit-level, entropy-constrained compression with wavelet-coded motion achieves up to 91× reduction (Lee et al., 23 Jul 2025).
SLAM and Tracking Integration: Temporal consistency is exploited for robust camera localization and mapping in dynamic environments, with explicit dynamic-vs-static Gaussian splitting and (optionally) optical flow-based supervision (Li et al., 20 Mar 2025, Sun et al., 7 Apr 2025).

5. Applications and Experimental Benchmarks

Dynamic 4D Gaussian Splatting has been applied to a wide variety of tasks:

Novel View and Time Synthesis: Real-time, high-fidelity synthesis on challenging synthetic (D-NeRF), real multi-view (Plenoptic Video, Technicolor), and driving (Waymo, KITTI) datasets. 4DGS models consistently outperform grid- and MLP-based radiance fields by >1 dB in PSNR and operate at ${>}100$ FPS (Yang et al., 2024, Duan et al., 2024, Lee et al., 23 Jul 2025).
Volumetric Video and Plenoptic Rendering: Efficient modeling of video sequences with complex non-rigid motion (Deng et al., 2024, Li et al., 28 Nov 2025).
Dynamic CT Reconstruction: Continuous-time 4DGS (X $^2$ -Gaussian) achieves state-of-the-art 4D medical image reconstruction with learned temporal periodicity (Yu et al., 27 Mar 2025).
Online SLAM: Simultaneous localization and mapping in dynamic scenes, leveraging Gaussian tracking and deformation (Li et al., 20 Mar 2025, Sun et al., 7 Apr 2025).
Compression for Resource-Constrained Devices: Aggressive pruning and quantization enable fast rendering and streaming on edge devices, achieving rates of 20 FPS+ on ARM-class GPUs (Lee et al., 23 Jul 2025, Zhang et al., 2024).
Sparse-Frame and Few-Shot Learning: Specialized pipelines exploit texture-awareness and monocular priors to reconstruct sharp geometry with very sparse (e.g., 10–20) input frames (Shi et al., 10 Nov 2025).

Benchmark results indicate that methods such as Wasserstein-constrained 4DGS (Deng et al., 2024) and hybrid/static-dynamic approaches (Oh et al., 19 May 2025) capably balance temporal consistency, geometry, and memory demands. Ablation studies consistently show substantial gains from filter-based temporal smoothing and explicit Gaussian splitting.

6. Limitations, Open Challenges, and Directions

Despite major advances, notable limitations remain:

Memory and Scalability: Native 4DGS can demand large memory footprints ( $>1$ GB for unconstrained representation), though recent methods acclimate via hybridization and quantization (Zhang et al., 2024).
Overfitting and Flicker: Under sparse inputs or inadequate temporal regularization, 4DGS can overfit dynamic regions or exhibit spatial/temporal flicker. Imposing priors or hybrid geometries mitigates these effects (Li et al., 28 Nov 2025, Oh et al., 19 May 2025).
Sparse Input and Non-Rigid Topologies: Highly non-rigid, topologically evolving scenes or those captured with very sparse frames remain challenging. Texture-aware regularization and self-supervised depth cues alleviate but do not fully resolve ill-posed settings (Shi et al., 10 Nov 2025).
Motion Disambiguity: Supervision via flow (e.g., GaussianFlow) and explicit state-space models counteract ambiguity in motion estimation (Gao et al., 2024, Deng et al., 2024).

Open research directions include learnable dynamic-vs-static classification, adaptive Gaussian splitting/merging, further compression optimizations, and tighter integration with geometric/physics simulators and higher-level scene priors. Optimal-transport and Wasserstein-geometry-based modeling are likely to further enhance both theoretical and empirical performance, especially for scenes exhibiting complex, temporally coherent deformation.

References

(Deng et al., 2024) Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling
(Yang et al., 2024) 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives
(Duan et al., 2024) 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
(Lee et al., 23 Jul 2025) Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting
(Li et al., 28 Nov 2025) Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis
(Lee et al., 2024) Fully Explicit Dynamic Gaussian Splatting
(Oh et al., 19 May 2025) Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
(Zhang et al., 2024) MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
(Chen et al., 23 Nov 2025) Alias-free 4D Gaussian Splatting
(Yu et al., 27 Mar 2025) X $^{2}$ -Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
(Li et al., 20 Mar 2025) 4D Gaussian Splatting SLAM
(Sun et al., 7 Apr 2025) Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM
(Feng et al., 28 Mar 2025) Disentangled 4D Gaussian Splatting: Towards Faster and More Efficient Dynamic Scene Rendering
(Liao et al., 3 Feb 2026) SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation
(Shi et al., 10 Nov 2025) Sparse4DGS: 4D Gaussian Splatting for Sparse-Frame Dynamic Scene Reconstruction
(Gao et al., 2024) GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
(Wu et al., 2023) 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
(Yang et al., 2023) Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting