Papers
Topics
Authors
Recent
2000 character limit reached

4D Gaussian Splatting Representation

Updated 4 December 2025
  • 4D Gaussian Splatting is an explicit spatio-temporal representation that models dynamic scenes using anisotropic Gaussian primitives.
  • It employs a differentiable rendering pipeline with temporal slicing and alpha blending for fast, photorealistic view synthesis.
  • Hybrid designs and compression techniques reduce storage and computational demands, enhancing applications in AR/VR, dynamic SLAM, and style transfer.

A 4D Gaussian Splatting (4DGS) representation is an explicit spatio-temporal scene representation for dynamic view synthesis, generation, style transfer, and scene understanding tasks. Fundamentally, 4DGS models a dynamic scene as a set of anisotropic Gaussian primitives in four-dimensional space-time ( x,y,z,tx, y, z, t ), with each primitive parameterizing both geometric structure and view/time-dependent appearance. The representation enables fast, differentiable, and photorealistic rendering of arbitrary views at arbitrary times, and natively supports temporal coherence and multi-view consistency.

1. Mathematical Definition and Parameterization

A single 4D Gaussian splat is defined by a spatio-temporal mean vector μR4\mu \in \mathbb{R}^4, a full positive-definite 4D covariance ΣR4×4\Sigma \in \mathbb{R}^{4 \times 4}, opacity α\alpha, and appearance parameters (either color or a basis expansion such as 4D spherindrical harmonics). The density function is: G(x,y,z,t)=exp ⁣(12(Xμ)Σ1(Xμ)),X=(x,y,z,t)G(x, y, z, t) = \exp\!\left(-\frac{1}{2}(X - \mu)^{\top} \Sigma^{-1} (X - \mu)\right), \quad X = (x, y, z, t) The covariance is typically factorized as Σ=RSSR\Sigma = R S S^\top R^\top, where S=diag(sx,sy,sz,st)S = \mathrm{diag}(s_x, s_y, s_z, s_t) and RSO(4)R \in SO(4) is a 4D rotation (often parameterized via two quaternions or an 8D rotor). The explicit form captures spatio-temporal anisotropy, allowing each primitive to encode fine-grained motion and deformation patterns (Yang et al., 30 Dec 2024, Duan et al., 5 Feb 2024, Feng et al., 28 Mar 2025).

The appearance model is often a truncated expansion in view/time harmonics: c(d,Δt)=n,l,mwnlm  Znlm(Δt,θ,ϕ)c(d, \Delta t) = \sum_{n, l, m} w_{nlm}\;Z_{nl}^m(\Delta t, \theta, \phi) where Znlm(Δt,θ,ϕ)=cos(2πnΔt/T)Ylm(θ,ϕ)Z_{nl}^m(\Delta t, \theta, \phi) = \cos\left(2\pi n \Delta t / T\right) Y_l^m(\theta, \phi) combines a periodic Fourier basis in time with spherical harmonics in view direction, and wnlmw_{nlm} are learned per-Gaussian coefficients (Yang et al., 30 Dec 2024, Duan et al., 5 Feb 2024).

Temporal slicing is achieved via the blockwise structure of Σ\Sigma: μxyzt=μ1:3+Σ1:3,4Σ4,41(tμt)\mu_{xyz|t} = \mu_{1:3} + \Sigma_{1:3, 4} \Sigma_{4,4}^{-1}(t - \mu_t)

Σxyzt=Σ1:3,1:3Σ1:3,4Σ4,41Σ4,1:3\Sigma_{xyz|t} = \Sigma_{1:3, 1:3} - \Sigma_{1:3,4} \Sigma_{4,4}^{-1} \Sigma_{4,1:3}

Thus, at each time tt, the 4D Gaussian marginalizes to a 3D Gaussian in space whose mean and shape evolve continuously (Yang et al., 30 Dec 2024, Duan et al., 5 Feb 2024, Oh et al., 19 May 2025).

2. Rendering Pipeline and Differentiable Splatting

The core rendering operation projects temporally-sliced 3D Gaussians into the image plane for the given camera and composits their contributions using front-to-back alpha blending. Each pixel color at time tt is accumulated as: I(u,v,t)=i=1Npi(t)pi(u,vt)αici(d,Δt)j<i(1pj(t)pj(u,vt)αj)I(u, v, t) = \sum_{i=1}^N p_i(t) \, p_i(u, v | t) \, \alpha_i \, c_i(d, \Delta t) \prod_{j<i} \left(1 - p_j(t) p_j(u, v | t) \alpha_j \right) where pi(t)p_i(t) is the temporal marginal, pi(u,vt)p_i(u, v | t) is the projected 2D Gaussian, and the product term accounts for transmittance (Yang et al., 30 Dec 2024, Duan et al., 5 Feb 2024, Yang et al., 2023). Fully differentiable GPU kernels are typically used, exploiting bounding ellipses and tile-based culling for efficiency, yielding real-time performance at megapixel resolutions (Duan et al., 5 Feb 2024, Feng et al., 28 Mar 2025, Zhang et al., 17 Oct 2024).

3. Learning and Optimization Objectives

4DGS models are optimized end-to-end to fit observed multi-view videos. The primary objective is photometric loss over all spacetime pixels: Lphotometric=(u,v,t)Irendered(u,v,t)Igt(u,v,t)1+λdssimLdssim\mathcal{L}_{\mathrm{photometric}} = \sum_{(u, v, t)} \|\mathcal{I}_{\mathrm{rendered}}(u, v, t) - \mathcal{I}_{\mathrm{gt}}(u, v, t)\|_1 + \lambda_{\mathrm{dssim}}\mathcal{L}_{\mathrm{dssim}} Regularization includes sparsity or mask loss for pruning insignificant Gaussians, temporal smoothness and motion sparsity priors for scene dynamics, and spatial/temporal regularizers to avoid degenerate solutions (Yang et al., 30 Dec 2024, Oh et al., 19 May 2025, Feng et al., 28 Mar 2025). For style transfer, additional feature-matching and style losses enforce consistency with reference style distributions in the embedded feature space (Liang et al., 14 Oct 2024).

4. Compression, Efficiency, and Hybrid Designs

High-fidelity 4DGS models typically involve millions of Gaussians, challenging storage and runtime efficiency. Recent works develop several strategies:

  • Memory-efficient attribute design: Decompose color into a per-Gaussian "direct current" component and a shared MLP "alternating current" predictor, reducing parameters from >100 per Gaussian to ≈3 (Zhang et al., 17 Oct 2024).
  • Pruning and merging: Use deformation-aware and importance-based pruning schemes to remove non-contributing or redundant primitives (Liu et al., 23 Jun 2024, Lee et al., 4 Oct 2025).
  • Hybrid 3D–4D representation: Partition Gaussians into static (3D-only) and dynamic (4D) sets, converting temporally-invariant splats to compact 3D form, reducing parameter count by 60–70% and lowering training time by 3–5× (Oh et al., 19 May 2025).
  • Anchor-based, predictive, and quantized models: Generate Gaussians from a compact set of 3D anchors with a deformation MLP and quantize attributes using adaptive quantization/context-based entropy coding (Wang et al., 11 Oct 2025, Huang et al., 13 May 2025).
  • Wavelet and rate–distortion-optimized compression: Employ temporal smoothness priors (e.g., wavelet transforms on trajectories) and learned rate–distortion coders to improve compression by up to 91× with little quality loss (Lee et al., 23 Jul 2025, Huang et al., 13 May 2025).

A comparative summary for key methods:

Method Storage Footprint PSNR FPS Key Innovations
MEGA (Zhang et al., 17 Oct 2024) 32 MB (190× ↓) 33.6 dB 83 DC+AC color, entropy pruning, FP16, zip compression
Hybrid 3D–4DGS (Oh et al., 19 May 2025) 273 MB peak 32.3 dB - Adaptive 3D/4D partitioning
Disentangled4DGS (Feng et al., 28 Mar 2025) - 32.8 dB 343 Decoupled spatial/temporal parameters, matrix-free pipeline
OMG4 (Lee et al., 4 Oct 2025) 3.6 MB (99% ↓) 31.8 dB 246 Gradient-based sampling/pruning/merging, 4D SVQ

5. Extensions and Applications

4D Gaussian Splatting forms the explicit backbone for a range of tasks in neural scene rendering:

  • Dynamic scene reconstruction: Photorealistic dynamic novel-view synthesis, surpassing neural volumetric and mesh-based alternatives in efficiency and consistency (Yang et al., 2023, Duan et al., 5 Feb 2024).
  • 4D style transfer: Zero-shot spatio-temporal transfer by embedding Gaussians into a high-dimensional feature space and matching style/content via linear transformations and reversible feature networks (Liang et al., 14 Oct 2024).
  • Generative 4D content: Fast synthesis of dynamic geometry/texture by driving canonical Gaussian motion with compact factorized deformation fields and diffusion priors (Ren et al., 2023).
  • 4D semantic/linguistic grounding: Embedding language-aligned features in 4DGaussians to enable open-vocabulary querying, spatial-temporal grounding, and localization (Fiebelman et al., 14 Oct 2024).
  • Dynamic SLAM: State-of-the-art camera tracking and mapping with explicit handling of moving objects, leveraging dynamic/static Gaussian designation and deformation (Li et al., 20 Mar 2025, Sun et al., 7 Apr 2025).
  • Compression for deployment: Practical downstream uses such as mobile AR/VR, remote rendering, and streaming of large-scale dynamic scenes given sub-megabyte model sizes (Wang et al., 11 Oct 2025, Huang et al., 13 May 2025).

6. Technical Challenges and Limitations

Though powerful, classic 4DGS approaches pose several technical bottlenecks:

  • Matrix/factorization overhead: Full 4D covariance updates and slicing operations can be memory/compute intensive; recent disentangled parameterizations circumvent this (Feng et al., 28 Mar 2025).
  • Overfitting/over-pruning: High flexibility can lead to redundancy or loss of fine detail if mask losses or anchor strategies are not properly regularized (Yang et al., 30 Dec 2024).
  • Dynamic range limitations: High-frequency or abrupt temporal patterns can induce ringing or blur, motivating hybrid representations and higher-order motion priors (Feng et al., 28 Mar 2025, Deng et al., 30 Nov 2024).
  • Attribute scaling for resource constraints: For domains like surgical scenes, explicit pruning and condensation of feature fields and color bases are required to make deployment feasible (Liu et al., 23 Jun 2024).

7. Future Prospects

Recent work highlights several directions for further advances, including the integration of state-space modeling frameworks (e.g., Kalman filtering, Wasserstein geometry) for smoother motion priors and physically plausible deformations (Deng et al., 30 Nov 2024), adaptive or learned anchor selection for better scalability (Huang et al., 13 May 2025, Fiebelman et al., 14 Oct 2024), and neural language/semantic integration at the 4D primitive level (Fiebelman et al., 14 Oct 2024). 4DGS remains a central representation for both fundamental research and application domains requiring real-time, high-fidelity 4D scene synthesis and analysis.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to 4D Gaussian Splatting Representation.