3D/4D Gaussian Splatting in Neural Rendering
- 3D/4D Gaussian Splatting is an explicit scene representation technique that models dynamic scenes with anisotropic Gaussian primitives encoding geometry, color, and motion.
- It enables differentiable and real-time neural rendering, supporting advanced applications such as SLAM, multi-modal fusion, and dynamic volumetric reconstruction.
- Innovations like deformation networks, keyframe interpolation, and entropy-based compression achieve significant memory reduction while supporting high fidelity metrics.
3D/4D Gaussian Splatting (GS) is a class of explicit scene representations central to modern high-speed neural rendering and dynamic volumetric reconstruction. In both its canonical 3D form and as temporally extended "4D" variants, GS models scenes as clouds of anisotropic Gaussian primitives. Each primitive encodes spatial geometry, color, opacity, and, in the dynamic case, deformation or motion over time—establishing a compact, GPU/rasterizer-friendly alternative to dense voxels or implicit neural fields. The technique supports differentiable rendering, real-time synthesis, and compression, extending to multi-modal domains (e.g., radar, sonar) and complex SLAM pipelines.
1. Mathematical Foundations: Gaussian Primitives and Volumetric Rendering
A 3D Gaussian primitive is parameterized by a spatial mean , diagonal or full covariance , color , and opacity . Its density at point is (Xiao et al., 20 Nov 2025). Rendering proceeds via splatting, where each 3D Gaussian is projected onto the image as a 2D ellipse using camera intrinsics and local Jacobians (Matias et al., 20 Oct 2025). The final rendered color for a pixel is produced by compositing in front-to-back order:
where is the color, and each is the effective transparency at the pixel after 2D projection. For dynamic (4D) models, time or additional physical modalities (e.g., Doppler) are encoded in each primitive via deformation fields or extended covariances.
Volumetric rendering is fully differentiable with respect to all Gaussian parameters, supporting gradient-based scene optimization and analytic backpropagation (Xiao et al., 20 Nov 2025, Wu et al., 2023).
2. Extensions to 4D: Temporal and Physical Augmentation
4D Gaussian Splatting encompasses several classes:
- Dynamic deformation networks: Each canonical 3D Gaussian receives deformations via a time-conditioned neural field (MLP or factorized HexPlane), outputting , , and for per-frame location, orientation, and scale (Wu et al., 2023, Ren et al., 2023).
- Keyframe interpolation: Explicit 4DGS (Ex4DGS) promotes static/dynamic separation, storing positions/rotations at sparse temporal keyframes for major motion components, and interpolating spatio-temporal trajectory using cubic Hermite splines (CHip) and spherical linear interpolation (Slerp) (Lee et al., 2024). Opacity dynamics are modeled by two-component Gaussian mixtures.
- Intrinsic 4D Gaussians: In some formalisms, primitives maintain a genuine mean and covariance , with dynamic regions sliced at each timestamp to a transient 3DGS (Oh et al., 19 May 2025).
- Radar and multi-modal fusion: Rad-GS augments the standard pipeline with 4D radar input (position, angular coordinates, Doppler velocity), using statistical tests on velocity residuals to mask dynamic objects and fuse radar point clouds by Bayesian updates (Xiao et al., 20 Nov 2025). The "fourth" dimension in these models is context-dependent: temporal for video, velocity for radar, etc.
3. Compression, Pruning, and Memory Efficiency
Efficient GS has developed along two principal axes:
- Parameter compression techniques autonomously remove low-significance Gaussians, truncate spherical harmonics, and quantize/entropy-code model attributes. Methods such as spatio-temporal significance pruning (Liu et al., 18 Mar 2025), deformation-aware metric pruning (Liu et al., 2024), and uncertainty/Hessian-based sensitivity scoring (Hanson et al., 2024) yield up to model reduction without perceivable loss in PSNR or SSIM.
- Restructuring compression incorporates hierarchical anchors, neural attribute decoding (shared context models), and latent plane condensation to further compress higher-dimensional embeddings (Liu et al., 18 Mar 2025, Youn et al., 8 Dec 2025). Deep context models exploit inter- and intra-scale redundancy for adaptive quantization.
Notably, MEGA achieves to memory reduction by factorizing color into per-Gaussian DC terms and a global AC neural predictor, along with entropy-constrained Gaussian deformation regularization (Zhang et al., 2024). Light4GS couples significance pruning with entropy-constrained SH compression, achieving total model compression (Liu et al., 18 Mar 2025). Wavelet transforms provide temporal smoothness priors enabling RD-tunable bitrate allocation in 4DGS (Lee et al., 23 Jul 2025).
4. Large-Scale Scene Reconstruction and SLAM Integration
Rad-GS demonstrates kilometer-scale outdoor SLAM by integrating raw radar point clouds (with Doppler) and images into a unified octree-managed 3D Gaussian map (Xiao et al., 20 Nov 2025). Dynamic object removal is accomplished using single-frame Doppler masks, propagated and grown via octrees, followed by region-constrained segmentation for robust masking before 3D reconstruction. Gaussians are merged or split adaptively, with memory growth held to , enabling real-time performance and sublinear storage per frame.
For distributed scientific workflows, multi-GPU GS training supports tens of millions of Gaussians for high-resolution isosurface visualization, with linear or near-linear scaling and in situ/post hoc integration for large-scale datasets (Han et al., 5 Sep 2025).
5. Optimization Strategies and Loss Terms
Scene optimization proceeds via alternating front-end pose tracking and global back-end refinement:
- Photometric reprojection: computes per-pixel errors between observed and rendered images, supporting geometry and texture fitting.
- Geometric loss: integrates depth maps (e.g., radar-derived in Rad-GS) as supervision for spatial accuracy.
- Shape/adaptation regularization: ensures roughness and anisotropy constraints on Gaussian covariances, typically parameterized by the ratio of scale eigenvalues and cross-terms (with thresholds/hyperparameters set per dataset).
- Knowledge distillation: LGS and similar light-weight compressive methods use a dual loss— guides the compressed system to mimic a teacher model, ensures fidelity to observed measurements (Liu et al., 2024).
Typical weightings place equal or near-equal emphasis on photometric and geometric terms, with roughness/compression regularizers tuned for domain needs. Pruning thresholds, entropy bottlenecks, and quantization step sizes are learned or empirically calibrated (Youn et al., 8 Dec 2025).
6. Experimental Validation and Benchmark Comparison
Quantitative results on standard benchmarks (e.g., NTU4DRadLM, Neural 3D Video, Technicolor, SCARED, ENDONERF) show that GS pipelines consistently achieve competitive or superior metrics:
- Rad-GS: PSNR $23.6$–$23.9$ dB, SSIM $0.80$, LPIPS $0.39$ on outdoor loops, with a dB PSNR gain on dynamic masking versus prior SLS/T-3DGS (Xiao et al., 20 Nov 2025).
- Ex4DGS: PSNR $32.11$ dB, SSIM , LPIPS $0.048$ on complex 4D scenes at $62$ fps training and rendering (Lee et al., 2024).
- Hybrid 3D-4DGS: $3$– faster training, lower memory, and improved PSNR ($32.25$ dB on N3V) over full 4DGS (Oh et al., 19 May 2025).
- Compression methods: MEGA and Light4GS yield storage reduction with sub-decibel PSNR drops and $10$– FPS gains (Zhang et al., 2024, Liu et al., 18 Mar 2025). RD optimization achieves up to compression while maintaining SSIM in dynamic scenes (Lee et al., 23 Jul 2025).
Across surgical, urban, and scientific scenes, these frameworks exhibit real-time rendering (up to $194$ fps (Liu et al., 2024)), seamless integration to downstream tasks (e.g., SLAM, object detection (Bai et al., 26 Jul 2025)), and have set new standards for both fidelity and operational efficiency.
7. Limitations, Current Challenges, and Future Directions
Despite rapid advances, challenges remain:
- Long-sequence dynamics: Most methods operate on short clips; efficient temporal encoding for persistent scenes is limited.
- Semantic compression: Compression is largely uniform; allocating model size by semantic region or task remains open (Youn et al., 8 Dec 2025).
- Generalization: Most compression and segmentation schemes are per-scene; zero-shot or few-shot transfer models are nascent.
- Mobile deployment: Hardware constraints (memory footprint, FP precision) pose problems for AR/VR and embedded edge devices.
- Uncertainty and robustness: Quantitative safety and higher-budget encoding for critical regions are under-explored.
Ongoing research addresses hierarchical temporal anchors, adaptive rate-distortion, hybrid Gaussian/MLP representations, and integration with radar/sonar for multi-modal SLAM (Xiao et al., 20 Nov 2025, Qu et al., 2024). Improvements in pruning, masking, and distributed storage will further extend GS's applicability to autonomous driving, medical robotics, and real-time generative content. Extensions continue in semantic instance-aware GS (Su et al., 10 Nov 2025), image-domain residual modeling (Nguyen et al., 18 Nov 2025), and high-performance scientific visualization (Han et al., 5 Sep 2025).