Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 96 tok/s
Gemini 3.0 Pro 48 tok/s Pro
Gemini 2.5 Flash 155 tok/s Pro
Kimi K2 197 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

4D Gaussian Splatting for Dynamic Scene Rendering

Updated 17 November 2025
  • 4D Gaussian Splatting is a framework that models dynamic scenes using explicit 4D Gaussian primitives capturing spatial geometry and temporal motion.
  • It employs a splatting-based rendering pipeline that projects 4D Gaussians into 2D for efficient, real-time photorealistic scene reconstruction.
  • End-to-end optimization and modular extensions enable practical applications in video compression, VR/AR, SLAM, and more.

4D Gaussian Splatting is an explicit, real-time volumetric scene representation framework that generalizes 3D Gaussian Splatting to unify spatio-temporal geometry and appearance for dynamic scene modeling, view synthesis, and downstream tasks. By embedding a scene as a collection of anisotropic 4D Gaussian primitives—parameterized by (x, y, z, t) mean vectors, full-rank 4×4 positive-definite covariance matrices that enable space–time rotation and stretching, and view- and time-dependent appearance models—4D Gaussian Splatting allows direct, end-to-end optimization of photorealistic dynamic reconstructions from multi-view or monocular video. Its raster-based, splatting-centric rendering and richness of explicit parameterization have made 4DGS a leading approach for dynamic radiance field modeling, motion-aware neural rendering, compact real-time video scene representation, and rate–distortion-optimized video compression.

1. Mathematical Formulation and 4D Gaussian Primitives

4D Gaussian Splatting represents dynamic scenes as a sum of explicit Gaussian lobes in ℝ⁴, each capturing not just spatial but also temporal support. Given a primitive ii, its (unnormalized) density is

Gi(p)=exp(12(pμi)TΣi1(pμi))G_i(\mathbf{p}) = \exp\left(-\frac{1}{2} (\mathbf{p} - \mu_i)^T \Sigma_i^{-1} (\mathbf{p} - \mu_i)\right)

where p=(x,y,z,t)TR4\mathbf{p} = (x, y, z, t)^T \in \mathbb{R}^4 is a spatio-temporal query, μiR4\mu_i \in \mathbb{R}^4 is the 4D mean, and ΣiR4×4\Sigma_i \in \mathbb{R}^{4 \times 4} is a full-rank covariance matrix. To capture arbitrary spatiotemporal anisotropy and orientation, Σi\Sigma_i is parameterized as

Σi=RiSiSiTRiT\Sigma_i = R_i S_i S_i^T R_i^T

with Si=diag(sx,sy,sz,st)S_i = \mathrm{diag}(s_x, s_y, s_z, s_t) and RiR_i an arbitrary 4D rotation, implemented via two unit quaternions (as 4D isoclinic rotation). This construction allows each Gaussian to form any oriented ellipsoid in space–time, with axes aligned to both spatial shape and motion.

Marginalizing and conditioning with respect to time yields closed-form formulas for efficient slicing: pi(t)=N(t;μt,Σtt) pi(xt)=N(x;μxyzt,Σxyzt) μxyzt=μ1:3+Σ1:3,4Σ4,41(tμt) Σxyzt=Σ1:3,1:3Σ1:3,4Σ4,41Σ4,1:3\begin{align*} p_i(t) &= \mathcal{N}(t; \mu_t, \Sigma_{tt}) \ p_i(\mathbf{x} | t) &= \mathcal{N}(\mathbf{x}; \mu_{xyz|t}, \Sigma_{xyz|t}) \ \mu_{xyz|t} &= \mu_{1:3} + \Sigma_{1:3,4} \Sigma_{4,4}^{-1} (t - \mu_t) \ \Sigma_{xyz|t} &= \Sigma_{1:3,1:3} - \Sigma_{1:3,4} \Sigma_{4,4}^{-1} \Sigma_{4,1:3} \end{align*} These equations admit efficient evaluation and slicing at arbitrary timestamps during rendering or gradient-based optimization.

2. Appearance Modeling via 4D Spherindrical Harmonics

For photorealistic rendering and view/time-dependent effects, each primitive is endowed with a small bank of coefficients for a separable Fourier–spherical-harmonic basis (“4D spherindrical harmonics”): Znm(Δt,θ,ϕ)=cos(2πnΔtT)Ym(θ,ϕ)Z_{n\ell}^m(\Delta t, \theta, \phi) = \cos\left(2\pi n \frac{\Delta t}{T}\right) Y_\ell^m(\theta, \phi) where nn indexes temporal Fourier frequency, YmY_\ell^m are angular spherical harmonics, and (θ,ϕ)(\theta, \phi) are the RGB observation direction. Each Gaussian's color is

ci(d,t)=n,,mcinmZnm(Δt,θ,ϕ)c_i(\mathbf{d}, t) = \sum_{n,\ell,m} c_i^{n\ell m} \, Z_{n\ell}^m(\Delta t, \theta, \phi)

capturing radiance that evolves across both viewpoints and time. View- and time-evolving appearance is crucial for faithful modeling of specularity, nontrivial illuminations, and nonstationary surface effects.

3. Efficient Splatting-Based Rendering Pipeline

Rendering in 4D Gaussian Splatting follows a real-time splatting paradigm. For a camera C\mathcal{C} (extrinsic EE, intrinsic KK), pixel (u,v)(u, v), and query time tt, the contributions from each Gaussian are:

  1. Temporal Marginalization: Compute 1D weight, pi(t)p_i(t), for alignment to time tt.
  2. Spatial Conditional Slicing and Projection: Compute μxyzt,Σxyzt\mu_{xyz|t}, \Sigma_{xyz|t} and project into camera space, obtaining a 2D Gaussian pi(u,vt)p_i(u, v | t).
  3. Splatting and Compositing: Each primitive is rasterized as a 2D elliptical kernel with per-pixel weight wi=pi(t)pi(u,vt)αiw_i = p_i(t) p_i(u, v | t) \alpha_i, textured with cic_i. Depth-sorting then enables physically correct front-to-back alpha compositing:

I(u,v,t)=iwici(d,t),wi=pi(t)pi(u,vt)αij<i(1pj(t)pj(u,vt)αj)I(u, v, t) = \sum_i w_i c_i(\mathbf{d}, t),\quad w_i = p_i(t) p_i(u, v | t) \alpha_i \prod_{j < i} (1 - p_j(t) p_j(u, v | t) \alpha_j)

This explicit 2D splatting—accumulating only active splats at each time—yields real-time performance (e.g., 114 FPS at HD resolutions), and GPU-amenable implementations due to its highly parallel structure.

4. End-to-End Learning and Optimization Strategies

All 4D Gaussian parameters {μi,Σi,αi,cinm}\{\mu_i, \Sigma_i, \alpha_i, c_i^{n\ell m}\} are optimized jointly via gradient descent, typically minimizing a photometric L2L_2 loss between the rendered and ground truth images: L=I(u,v,t)Igt(u,v,t)22\mathcal{L} = \|I(u, v, t) - I_{\rm gt}(u, v, t)\|_2^2 Perceptual losses (e.g. LPIPS) and structural similarity (SSIM) terms may be added for increased fidelity. To ensure temporal coherence, training batches often sample rays across multiple time instants (as opposed to slicing-by-frame).

Model complexity is adaptively controlled: spatial and temporal gradients of means are monitored, and underfit regions are densified by spawning new Gaussians, while redundancy is reduced via pruning. No additional deformation networks or explicit motion fields are required; the rotation and scale of each 4D ellipsoid suffice to model scene flow.

Initialization is commonly performed with \sim100K points from a static point cloud (e.g., COLMAP), uniformly spread in time, with identity rotations and temporally broad sts_t to ensure broad initial coverage.

5. Performance, Quality, and Trade-offs

The 4DGS framework achieves notable performance/quality trade-offs:

Dataset Resolution PSNR (dB) LPIPS DSSIM FPS
Plenoptic Video 800×800 32.01 0.055 0.014 114
D-NeRF (monocular) 34.09

Temporal stability is maintained as, for static regions, sts_t grows larger over time, thus the number of active splats per frame remains nearly constant as the total time TT increases. This property ensures scalability to longer videos without linear growth in the number of Gaussians rendered per frame.

Compared to contemporaneous neural field methods (e.g., HexPlane), 4DGS shows improved visual fidelity and two orders-of-magnitude faster rendering (Yang et al., 2023), validating the efficacy of the explicit, splatting-centric approach.

6. Model Variants and Extensions

Variants have been derived to address memory and computational efficiency while maintaining quality:

Such modularity enables 4DGS to be tailored for diverse real-world scenarios, including large-scale video, free-viewpoint VR/AR, and continuous medical tomography (Yu et al., 27 Mar 2025).

7. Impact, Limitations, and Prospects

Treating spacetime as a unified anisotropic domain and parameterizing each primitive with a full 4D Gaussian supplemented by harmonic coefficients allow 4DGS to achieve end-to-end photorealistic scene modeling and real-time dynamic rendering. The explicit separation of space and time avoids the need for motion priors, complex deformation fields, or regularization on implicit signals.

Limitations include:

  • Large storage and memory demands in uncompressed form, motivating ongoing compression efforts.
  • The smoothness and support size of Gaussians can limit the representation of high-frequency details and very abrupt occlusions.
  • Absence of explicit long-range correlation mechanisms may hinder modeling in ultra-sparse regimes or highly articulated motion.

Ongoing research addresses these with more expressive basis functions, hierarchical or acceleration-encoded kernels, adaptive spatiotemporal splits, and integration with semantic or language-driven guidance.

4D Gaussian Splatting defines an explicit, optimizable, and splatting-based approach to dynamic scene representation that provides a foundation for real-time photorealistic rendering, highly efficient compression, and downstream applications in graphics, vision, medical imaging, and robotics. Its core mathematical structure and splatting-first rendering paradigm have catalyzed broad interdisciplinary interest and active methodological evolution.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to 4D Gaussian Splatting.