Spatiotemporal Gaussian Splatting

Updated 4 December 2025

Spatiotemporal Gaussian Splatting is an explicit volumetric representation that models dynamic 3D/4D scenes using parameterized anisotropic Gaussians in spacetime.
It employs advanced deformation fields and temporal interpolations to accurately capture non-rigid and rigid motions in complex applications like surgical imaging and robotics.
The method achieves significant speedups and high compression rates, enabling real-time rendering and calibration in diverse, dynamic environments.

Spatiotemporal Gaussian Splatting is an explicit volumetric representation and rendering paradigm that models 3D and 4D dynamic (time-varying) scenes as a set of parameterized anisotropic Gaussians splatted in spacetime. This family of methods generalizes static 3D Gaussian Splatting by incorporating time as an explicit variable in the scene model, enabling real-time, high-fidelity synthesis, tracking, and calibration in complex dynamic environments—often achieving orders-of-magnitude speedups versus NeRF-style implicit fields under challenging scenarios such as surgical reconstruction, robotics, and 4D medical imaging.

1. Mathematical Foundations and Representational Structure

The core representational primitive is the spatiotemporal Gaussian: $\mathcal{G}_i(\mathbf{x}, t) = w_i\,\exp\left(-\frac{1}{2} \begin{bmatrix}\mathbf{x}-\boldsymbol{\mu}_i\ t-\tau_i\end{bmatrix}^\top \Sigma_i^{-1} \begin{bmatrix}\mathbf{x}-\boldsymbol{\mu}_i\ t-\tau_i\end{bmatrix}\right)$ where $\mathbf{x} \in \mathbb{R}^3$ (space), $t \in \mathbb{R}$ (time), $\boldsymbol{\mu}_i \in \mathbb{R}^3$ , $\tau_i \in \mathbb{R}$ , and $\Sigma_i \in \mathbb{R}^{4 \times 4}$ encodes both spatial and temporal anisotropy; $w_i$ is a radiance or density strength. Further attributes, such as spherical-harmonic color coefficients, opacity, and view-dependent appearance, can be appended for photo-realistic rendering.

Some designs decouple static and dynamic elements in the scene. For static backgrounds, purely spatial (3D) Gaussians with time-invariant parameters suffice, while moving objects are modeled with 4D Gaussians whose parameters are time-dependent or explicitly parameterized by deformation networks, SE(3) rigid transformations, or keyframe-interpolated trajectories (Oh et al., 19 May 2025, Lee et al., 21 Oct 2024).

Rendering is realized by “slicing” the 4D (space–time) Gaussians at the query time, projecting to 3D, then splatting the spatial ellipsoids onto the image plane via alpha compositing. This differentiable splatting process enables both photometric loss optimization and gradient-based downstream tasks (Li et al., 28 Nov 2025, Liu et al., 23 Jun 2024).

2. Deformation Fields and Temporal Parameterization

Dynamic scenes require encoding non-stationary geometry. Several parameterization strategies exist:

Learned 4D deformation fields: Using factorized (HexPlane or K-plane) spatial–temporal feature fields, a small MLP decodes time-varying offsets to the base Gaussian parameters. For instance, EndoGS leverages six HexPlanes over (x, y, z, t) pairs to interpolate feature vectors, which are mapped to parameter deltas $(\Delta\mu,\Delta s,\Delta r,\Delta\sigma,\Delta sh)$ for each Gaussian (Zhu et al., 21 Jan 2024). This architecture allows expressive non-rigid deformations as needed for surgical tissues or other deformable objects.
Keyframe/time-sampled interpolation: Fully explicit dynamic splatting (Ex4DGS) stores positions, orientations, and other attributes of each dynamic Gaussian at a set of keyframes, employing Hermite spline (for positions) and SLERP (for rotations) interpolation between them. Static scene regions use only linear parameterization, significantly reducing memory and computation (Lee et al., 21 Oct 2024).
Rigid SE(3) pose propagation: For rigid body motion across time, as in PMGS and PEGS, a single “canonical” object-centric model is reconstructed, with its instance at each timestep given by an SE(3) transformation. The entire set of Gaussians is then propagated per frame, and trajectory constraints (e.g., Newtonian acceleration consistency) are imposed to couple temporally adjacent frames (Xu et al., 4 Aug 2025, Xu et al., 21 Nov 2025).
Hybrid 3D–4D adaptive representations: To optimize computational efficiency, hybrid schemes carry out periodic conversion of nearly time-invariant 4D Gaussians into purely spatial 3D Gaussians. The conversion criterion is based on the effective temporal scale parameter exceeding a threshold (Oh et al., 19 May 2025).

This flexibility enables encoding a rich range of motion—rigid, articulated, non-rigid, and partially static—with high temporal fidelity.

3. Compression, Pruning, and Computational Scalability

Memory and computational bottlenecks are addressed through structured pruning and quantization:

Deformation-aware pruning: Gaussians are scored by their contribution to deformation (via, e.g., spatial–temporal volume change), with low-impact primitives pruned to reduce redundancy (Liu et al., 23 Jun 2024).
Temporal consistency pruning: Mask variables per Gaussian and per frame are optimized to ensure only time-relevant Gaussians are retained, as in TC3DGS, where an explicit mask loss and relevance scores are employed (Javed et al., 7 Dec 2024).
Attribute dimensionality pruning: Higher-order spherical harmonics can be safely truncated in non-critical regions, and attribute vectors compressed, with distillation losses bridging the representational gap (Liu et al., 23 Jun 2024).
Keypoint-based trajectory simplification: For each time-varying parameter, the Ramer–Douglas–Peucker algorithm is used to identify a minimal set of keyframes, with per-frame attributes reconstructed by linear interpolation, yielding substantial storage reduction (Javed et al., 7 Dec 2024).
Gradient-aware quantization: Mixed-precision quantization allocates more bits to parameters with higher gradient sensitivity, as determined by accumulated magnitude gradients during training (Javed et al., 7 Dec 2024).

Empirically, combinations of these strategies achieve compression rates up to 67× (TC3DGS), with sub-millisecond rendering rates and negligible perceptual degradation (Javed et al., 7 Dec 2024, Liu et al., 23 Jun 2024).

4. Loss Functions and Optimization Strategies

Effective spatiotemporal Gaussian splatting requires joint optimization of photometric, geometric, and temporal objectives:

Weighted photometric losses: Spatial–temporal masks assign higher weight to pixels/voxels obscured by occlusions or underrepresented in time (e.g., instrument occlusions in endoscopy), ensuring recovery of dynamic but intermittently visible scene regions (Zhu et al., 21 Jan 2024).
Depth-guided and structure-consistent regularization: When depth supervision is available (e.g., MVS or monocular depth predictions), global–local ranking and variance-reduction constraints yield spatial and temporal geometric coherence, particularly under sparse-input scenarios (Li et al., 28 Nov 2025).
Physical priors and dynamics constraints: Explicit acceleration consistency losses enforce Newtonian motion laws on rigid or projectile objects, promoting temporally smooth and physically plausible trajectory estimation (Xu et al., 4 Aug 2025, Xu et al., 21 Nov 2025).
Cross-modal and temporal fusion: Kalman filters and cross-modal update rules integrate cues from event, frame, and optical flow modalities, correcting cumulative estimation errors and stabilizing tracking under challenging motion regimes (Xu et al., 4 Aug 2025, Xu et al., 21 Nov 2025).
Semantic, behavior, or future prediction regularization: In robotics, the future-consistency loss ties predicted sequences (under robot-chosen actions) to the evolution of spatio-temporal Gaussians, grounding world models and enabling closed-loop manipulation (Lu et al., 13 Mar 2024).

Optimization is typically staged, e.g., initial static fit followed by full dynamic parameter refinement. Dynamic simulated annealing, adaptive schedules, and curriculum-based extensions progressively introduce complexity, accelerating convergence and preventing degenerate solutions (Xu et al., 4 Aug 2025, Xu et al., 21 Nov 2025).

5. Applications Across Domains

Spatiotemporal Gaussian Splatting is deployed in a variety of demanding contexts:

Dynamic medical imaging: In endoscopic tissue reconstruction, EndoGS achieves deformation-aware, high-quality 3D modeling of non-rigid surgical scenes with explicit occlusion handling and surface-aligned regularization (Zhu et al., 21 Jan 2024). In 4D CT, X²-Gaussian yields continuous-time, high-resolution tomographic volumes without phase binning, achieving substantial PSNR improvements over prior techniques (Yu et al., 27 Mar 2025). In 4D flow MRI, PINGS-X leverages axes-aligned normalized splatting for efficient and theoretically sound super-resolution (Jo et al., 14 Nov 2025).
Autonomous systems and calibration: 3DGS-Calib enables real-time, targetless calibration of multi-modal sensor platforms (e.g., LiDAR–camera), jointly optimizing spatial and temporal offsets with photometric consistency, outperforming implicit NeRF-based calibration by a large margin in both accuracy and training time (Herau et al., 18 Mar 2024).
Dynamic scene compression and streaming: TC3DGS and LGS enable highly compressed, yet high-fidelity, representations, scalable to real-time deployment on edge devices, with storage and memory reductions of 9–67×, and rendering rates exceeding 800 fps (Javed et al., 7 Dec 2024, Liu et al., 23 Jun 2024).
Robotic manipulation: ManiGaussian demonstrates end-to-end action prediction and scene forecasting, with dynamic Gaussians parameterized by deformation networks coupled to language-conditioned policies, achieving superior multi-task performance over NeRF and perception-based alternatives (Lu et al., 13 Mar 2024).
High-dynamic reconstruction and event integration: STD-GS and PEGS explicitly disentangle the spatio-temporal structure of dynamic objects vs. static backgrounds, integrating event and frame streams through hybrid clustering, priors, and regularization, greatly enhancing temporal fidelity under extreme motion (Zhou et al., 29 Jun 2025, Xu et al., 21 Nov 2025).

6. Comparative Performance and Limitations

Empirical studies report the following:

Method	Compression Ratio	PSNR (dB)	FPS	Unique Features
TC3DGS (Javed et al., 7 Dec 2024)	up to 67×	−0.5 vs. baseline	>890	Keypoint interpolation, quantization
LGS (Liu et al., 23 Jun 2024)	>9×	≤0.2 drop	188–195	DAP, attribute pruning, distillation
3DGS-Calib (Herau et al., 18 Mar 2024)	—	—	robust	Joint SE(3), temporal, photometric opt.
EndoGS (Zhu et al., 21 Jan 2024)	—	—	<1ms	Deformation fields, depth/photo/surface loss
ManiGaussian (Lu et al., 13 Mar 2024)	—	—	—	Deformation world model, BC, dynamics loss

Although spatiotemporal Gaussian Splatting is broadly robust, open challenges include managing severe sparsity in input views, modeling strongly aperiodic or non-stationary phenomena, resolving ambiguous occlusions, and extending to deformable topological changes. Some approaches (e.g., (Li et al., 28 Nov 2025)) introduce dynamic consistency evaluation for geometry priors, and others propose axes alignment or merging to control the parameter explosion in high-dimensional applications (Jo et al., 14 Nov 2025).

7. Trends and Prospects

Active development focuses on several themes:

Edge deployment and AIoT: Real-time rendering at sub-watt budgets via highly pruned and compressed models (Li et al., 28 Nov 2025).
Explicit/implicit hybridization: Adaptive representation—static/dynamic separation, fusion of Gaussian and neural field approaches (Oh et al., 19 May 2025, Lee et al., 21 Oct 2024).
Multi-modal and physics-informed supervision: Inclusion of event streams, optical flow, and PDE-based regularization to improve robustness and accuracy under occlusion, blur, and physical constraints (Xu et al., 21 Nov 2025, Xu et al., 4 Aug 2025, Jo et al., 14 Nov 2025).
Continuous-time modeling: Removal of phase discretization in medical imaging, learned periodic priors, and data-adaptive temporal parameterizations (Yu et al., 27 Mar 2025).
Data-driven regularization: Structured ranking, global–local matching, and context-conditioned fusion for improved geometry from limited or inconsistent depth cues (Li et al., 28 Nov 2025).

Continued work is directed at extending scalability to long, highly dynamic sequences, generalized sensor fusion, real-time medical or industrial feedback loops, and unsupervised or physics-constrained learning in 4D.

Spatiotemporal Gaussian Splatting defines the state-of-the-art explicit representation for dynamic 3D and 4D volumetric data, with mathematical flexibility, computational tractability, and relevance across a spectrum of scientific, industrial, and medical applications (Zhu et al., 21 Jan 2024, Herau et al., 18 Mar 2024, Oh et al., 19 May 2025, Javed et al., 7 Dec 2024, Jo et al., 14 Nov 2025, Li et al., 28 Nov 2025, Lee et al., 21 Oct 2024).