4D Gaussian Splatting: Dynamic Scene Rendering
- 4D Gaussian Splatting is a representation method that integrates time directly by coupling a canonical set of 3D Gaussians with a learned 4D deformation field for dynamic scene modeling.
- It employs a HexPlane-inspired decomposition and lightweight MLPs to efficiently capture nonrigid, complex motions while maintaining high rendering fidelity.
- The approach achieves real-time performance—up to 82 FPS on commodity GPUs—by optimizing memory use and leveraging differentiable splatting for photorealistic view synthesis.
4D Gaussian Splatting (4DGS) is an explicit, real-time representation and rendering method for dynamic scenes that unifies spatial and temporal modeling using a mixture of canonical 3D Gaussians and a learned deformation field. By leveraging a hybrid architecture of 3D Gaussians parameterized by position, rotation, scale, and appearance, together with a neural field defined over 4D space-time, 4DGS achieves both high rendering efficiency and scene fidelity. The approach maintains a single, canonical 3D Gaussian set and learns a compact, time-dependent deformation field—rather than replicating per-frame Gaussians—enabling rapid, memory-efficient modeling of complex, nonrigid motion. Distinctive design elements include a HexPlane-inspired decomposition of the 4D deformation field and a lightweight MLP for per-Gaussian spatiotemporal deformation, facilitating real-time, photorealistic dynamic scene synthesis on commodity GPUs.
1. Foundations and Motivation
Dynamic scene modeling for view synthesis and reconstruction poses challenges in terms of memory, training efficiency, and motion complexity. Standard 3D Gaussian Splatting (3DGS) methods represent static scenes as point clouds of anisotropic Gaussians, each with spatio-angular appearance coefficients (e.g., spherical harmonics), optimized for differentiable splatting-based rendering. Applying 3DGS per frame in a dynamic sequence leads to prohibitive memory growth (, where is time and is the Gaussian count).
4DGS (Wu et al., 2023) extends the 3DGS paradigm to dynamic scenes by explicitly integrating time as a fourth dimension. Rather than discretizing time or storing separate sets of Gaussians, 4DGS maintains a canonical set and models temporal evolution via a learned 4D deformation field. This field, queried with spatiotemporal coordinates, dynamically updates each Gaussian’s position, orientation, and scale, thus capturing transitions, deformations, and articulated or nonrigid motion compactly.
This construction addresses difficulties in accurately modeling complex motions without sacrificing training/rendering speed or requiring excessive storage. The approach achieves real-time synthesis (up to 82 FPS at resolution on an RTX 3090) while matching or surpassing quality benchmarks of previous state-of-the-art methods.
2. Core Representation and Neural Deformation Field
The 4DGS explicit representation consists of two tightly coupled components:
- Canonical 3D Gaussians: Each is parameterized by spatial mean , rotation (factorized quaternion), scale, color (often as low-order spherical harmonics), and opacity. This set encodes scene geometry and base appearance.
- 4D Neural Voxels: To govern scene dynamics and deformation, 4DGS defines a learnable neural field on a set of 2D voxel planes. Building on HexPlane principles, six planes cover coordinate pairs: spatial and spatiotemporal . Each plane provides features on a 2D grid (e.g., ). For a given Gaussian at , features are extracted via bilinear interpolation from each plane.
The fusion process:
where is the concatenated per-plane feature and is a small MLP yielding the fused local spatiotemporal feature.
This is input to three separate lightweight MLP heads to predict residuals for position (), rotation (), and scale () per Gaussian:
This decomposition allows fine-scale, flexible adjustment of geometry at arbitrary timepoints.
Underlying the geometric modeling are probabilistic equations for splatting: thus the rotated, anisotropic Gaussian can capture elongated and oriented structures in 3D, and—via deformation—across time.
3. Splatting-based Rendering and Optimization
Rendering with 4DGS proceeds by projecting the deformed 3D Gaussians at any queried time onto the image plane and blending their contributions via differentiable alpha compositing using the ordering of depths.
Given the canonical parameters and per-Gaussian deformations at timestamp , attributes are updated and splatted using the standard 3DGS equation. The time-conditional Gaussian update ensures motions and deformations are expressed directly, without recourse to per-frame explicit tracking. This formulation is compatible with rasterization-based acceleration and benefits from the locality and continuity of Gaussian support.
Optimization is performed to minimize a photometric reconstruction loss (typically a weighted sum of and SSIM losses) between synthesized and ground-truth images, jointly learning the static Gaussian set and the parameters of the hexplane neural field and MLP heads.
4. Computational Performance and Resource Efficiency
4DGS achieves real-time dynamic scene rendering, attaining 82 FPS at on an RTX 3090 for synthetic data, and 30 FPS at for real-world samples. Memory efficiency is substantially improved compared to baselines since storage scales as ( = number of Gaussians, = deformation field parameters), in contrast to for full per-frame representations.
Benchmark results indicate high-quality view synthesis: strong PSNR, SSIM, and low LPIPS values are reported, matching or surpassing approaches like TiNeuVox, KPlanes, and other dynamic NeRF extensions, with marked improvements in both training convergence and real-time inference speed.
5. Limitations and Prospective Enhancements
Several limiting factors are identified:
- Modeling very large, nonrigid motions and dramatic scene changes can pose difficulties, particularly with monocular input data.
- Distinguishing dynamic and static scene components in a fully unsupervised manner remains a challenge, sometimes leading to deformation leakage or background artifacts.
- Scaling to urban or very large environments is limited, as larger numbers of Gaussians and deformation field queries result in higher computational and storage costs.
Future research directions suggested include the integration of additional priors (e.g., depth, optical flow), more robust instance-level separation within the deformation network, and algorithmic strategies for scaling the architecture to longer sequences or larger, more complex environments.
6. Practical Applications and Impact
4DGS’s blend of explicit, low-latency dynamic modeling and high-quality view synthesis is well-suited for a variety of applications:
- Real-time rendering of dynamic scenes for VR/AR, interactive editing, and simulation of real-world environments
- Movie production and post-processing, offering frame-accurate temporal manipulation and high-fidelity compositing
- Sports analysis, tracking, and replay with dynamic 3D reconstructions from video
- 3D object tracking, motion editing, and spatiotemporal reconstruction in dynamic scenarios
The explicit, manipulable Gaussian representation also holds promise for advanced editing, tracking, and real-time control applications relevant to robotics and dynamic environment understanding.
7. Resources
Additional demonstrations, datasets, and source code implementing the 4DGS method for real-time dynamic scene rendering are provided at https://guanjunwu.github.io/4dgs/ (Wu et al., 2023). This resource enables reproducibility and practical integration into research workflows.
4D Gaussian Splatting constitutes a unified, explicit, and highly efficient framework for real-time dynamic scene modeling, advancing the field beyond static or frame-based approaches by directly leveraging space-time correlations in a canonical set of deformable Gaussian primitives. Its innovations in representation, feature decomposition, and lightweight temporal deformation yield state-of-the-art rendering speed and memory efficiency for dynamic scene reconstruction and synthesis.