4D Gaussian Representation

Updated 5 October 2025

4D Gaussian representation is a method that models dynamic scenes as anisotropic Gaussian primitives in continuous space-time, capturing motion, deformation, and evolving appearance.
It utilizes a 4D mean and covariance to condition spatial Gaussian parameters at query times, enabling high-fidelity rendering and temporally coherent scene synthesis.
The approach enhances compression and scalability, supporting applications from VR/AR to medical imaging and video restoration through efficient, dynamic modeling.

A 4D Gaussian representation explicitly models dynamic scenes by parameterizing scene elements as anisotropic Gaussian primitives in a continuous 4D spatio-temporal domain, with coordinates (x, y, z, t). This approach extends 3D Gaussian Splatting—widely used in static scene rendering—by adding time as an explicit dimension, enabling high-fidelity dynamic scene synthesis, efficient storage, and real-time rendering. The formulation incorporates mechanisms for motion, deformation, and temporally-evolving appearance, allowing both reconstruction and generation tasks to benefit from temporally coherent, explicit, and scalable representations.

1. Fundamentals of 4D Gaussian Representation

A 4D Gaussian primitive is defined by its 4D mean vector $\mu = (\mu_x, \mu_y, \mu_z, \mu_t)$ and a $4 \times 4$ covariance matrix $\Sigma$ , representing both spatial and temporal extents. The covariance is factorized as

$\Sigma = R S S^\top R^\top$

where $S = \operatorname{diag}(s_x, s_y, s_z, s_t)$ encodes anisotropic scaling along each axis, and $R$ is a 4D rotation matrix (parameterized, for example, by pairs of quaternions or 4D rotors), allowing for arbitrary orientation in space-time (Yang et al., 2023, Duan et al., 5 Feb 2024).

Sampling the 4D Gaussian at time $t$ yields a conditional spatial (3D) Gaussian whose mean and covariance are computed using standard conditioning formulas: $\mu_{xyz|t} = \mu_{1:3} + \Sigma_{1:3,4} \Sigma_{4,4}^{-1} (t - \mu_t), \qquad \Sigma_{xyz|t} = \Sigma_{1:3,1:3} - \Sigma_{1:3,4} \Sigma_{4,4}^{-1} \Sigma_{4,1:3}$ This encodes not just the instantaneous configuration, but the motion trajectory and deformation across time, critical for high-fidelity dynamic scene representation.

2. Deformation Mechanisms and Temporal Modeling

Early approaches to dynamic scenes built on a canonical 3D Gaussian set with an additional deformation network that predicts a deformation field (translations, rotations, scalings) for each Gaussian at any query time. This field is parameterized as a neural network with inputs $(\text{Gaussian state}, t)$ and produces per-Gaussian offsets $(\Delta X, \Delta r, \Delta s)$ : $(X', r', s') = (X + \Delta X, r + \Delta r, s + \Delta s)$ where $\Delta X, \Delta r, \Delta s = \varphi_d(f_h)$ , and $f_h$ is the feature vector aggregated from the HexPlane neural voxel decomposition (Wu et al., 2023).

Recent "native 4D" methods directly treat Gaussians as static primitives in 4D, with their temporal motion encoded within the covariance and the 4D mean, removing the need for separate per-Gaussian deformation fields (Yang et al., 30 Dec 2024, Duan et al., 5 Feb 2024, Duan et al., 5 Feb 2024).

The temporal smoothness and coherence have been further enhanced by Wasserstein-constrained or state-space regularization, ensuring physically plausible, smooth Gaussian trajectories: $W_2^2(\mathcal{N}_1, \mathcal{N}_2) = \| \mu_1 - \mu_2 \|^2 + \operatorname{Tr}(\Sigma_1 + \Sigma_2 - 2 (\Sigma_1^{1/2} \Sigma_2 \Sigma_1^{1/2})^{1/2})$ This type of regularization penalizes abrupt nonphysical changes in both mean and covariance, leading to temporally stable dynamic representations (Deng et al., 30 Nov 2024).

3. Appearance Modeling: 4D Spherindrical Harmonics

Dynamic scenes exhibit not just geometry and motion but also time-varying, view-dependent appearance. The color function attached to each Gaussian is expressed as $c(d, t)$ and expanded in a 4D spherindrical harmonic basis: $Z_{nlm}(t, \theta, \phi) = \cos \left( \frac{2\pi n t}{T} \right) Y_l^m(\theta, \phi)$ where $Y_l^m$ are 3D spherical harmonics parameterizing the view direction and the cosine term encodes temporal appearance evolution. This orthonormal basis allows for efficient and compact representation of complex time- and view-varying appearance (Yang et al., 2023, Yang et al., 30 Dec 2024).

Where memory efficiency is critical, this expansion is sometimes replaced with lightweight color predictors that decompose color into a direct (DC) component and a temporal/view-aware AC component predicted via a small MLP, reducing storage requirements by an order of magnitude (Zhang et al., 17 Oct 2024).

4. Rendering Pipeline and Real-Time Synthesis

The rendering process evaluates the conditional 3D Gaussian at query time, projects it into image space, and splats its contribution using an alpha-blending or tile-based routine: $I(u, v, t) = \sum_{i} p_i(t) p_i(u, v|t) \alpha_i c_i(d) \prod_{j < i} [1 - p_j(t) p_j(u, v|t) \alpha_j]$ A perspective projection and appropriate transformation of the mean and covariance are used to obtain the 2D splat. Computation is heavily optimized, frequently leveraging custom CUDA backends to achieve frame rates exceeding 80–500 FPS (depending on resolution, GPU, and compactness of the scene) (Wu et al., 2023, Duan et al., 5 Feb 2024).

5. Compression, Scalability, and Hybrid Variants

The high-dimensionality of 4D Gaussian representations introduces storage and redundancy challenges. Solutions include:

Hierarchical Pruning and Deformation-Aware Reduction: Redundant or static Gaussians are detected via deformation-aware scores or temporal scale analysis and are pruned or converted to efficient 3D representations, resulting in hybrid 3D–4D frameworks (Liu et al., 23 Jun 2024, Oh et al., 19 May 2025).
Lossless and Lossy Compression: Individual Gaussian parameters are quantized (vector quantization, low-bit), and redundancy across time is exploited via wavelet transforms, learning-based entropy models, or motion-aware temporal grids. These allow end-to-end RD (rate-distortion) optimization and up to 91× reduction of model size while maintaining high perceptual quality (Lee et al., 23 Jul 2025, Hu et al., 24 Mar 2025, Zhong et al., 22 Sep 2025).
Motion Decoupling and Compensation: In highly compressed or streaming settings, a layered decomposition segregates static backgrounds from dynamic foregrounds with explicit, lookahead-based assignment. Motion fields and compensation Gaussians further remedy emergent content and local inaccuracies (Zhong et al., 22 Sep 2025).

Dedicated architectures have also appeared for streaming and immersive video, leveraging sparse, motion-aware feature streams and learned compression networks tailored for both temporal and spatial redundancy (Li et al., 12 May 2025).

6. Practical Applications

The 4D Gaussian representation underpins a wide array of applications:

Photorealistic dynamic scene rendering and novel view synthesis: Fast, temporally consistent rendering of scenes from arbitrary viewpoints in real time, essential for VR/AR, interactive games, and film.
Compression for interactive streaming: Strong RD performance, supporting variable-bitrate, real-time free-viewpoint video with competitive fidelity at orders-of-magnitude lower bitrate compared to earlier methods (Hu et al., 24 Mar 2025, Li et al., 12 May 2025, Zhong et al., 22 Sep 2025).
Motion analysis and segmentation: Explicit representation allows dynamic object tracking, editing, and temporally coherent segmentation; recent work pioneers per-object, promptable segmentation directly in the 4D Gaussian domain (Ji et al., 5 Jul 2024).
Medical and technical imaging: Compact, deformation-coupled modeling enables 4D reconstruction from sparse projections for 4D Cone-Beam CT in radiotherapy, achieving fine dynamic detail with high noise/artifact suppression (Fu et al., 7 Jan 2025).
Video restoration: Motion blur-aware 4D Gaussian models enable high-quality deblurring, frame interpolation, and video stabilization from blurry monocular video (Wu et al., 9 Dec 2024).

7. Implications, Limitations, and Research Directions

The explicit 4D Gaussian paradigm unifies geometry, motion, and appearance in a continuous, differentiable space–time volume, providing transparency, editability, and high performance. Current directions include:

Further compression and hardware adaptation for edge deployment (Lee et al., 23 Jul 2025, Zhang et al., 17 Oct 2024)
Deeper integration with semantic and language features for open-vocabulary querying and spatio-temporal grounding (Fiebelman et al., 14 Oct 2024)
Advancements in hybrid mesh–Gaussian and animation pipelines, explicitly merging strengths of both explicit Gaussian splatting and mesh-based deformation/skinning (Li et al., 9 Oct 2024)
Scaling to more complex, multi-object, or variable-length scenes via dynamic anchoring and grouping strategies (Yang et al., 30 Dec 2024, Zhong et al., 22 Sep 2025)
Integration of optimal-transport and state-space filtering for physically plausible motion trajectories and reduced flicker/artifacts (Deng et al., 30 Nov 2024)

Ongoing challenges include managing memory footprint for very large or long-duration scenes, further improving dynamic region segmentation and control, and exploiting uncalibrated or incomplete data for robust, large-scale reconstructions (Luo et al., 1 Oct 2025). As the field advances, 4D Gaussian representation continues to serve as a foundational tool for real-time, scalable, and richly editable dynamic scene synthesis and understanding.