P-4DGS: Predictive 4D Gaussian Splatting

Updated 15 October 2025

P-4DGS is a dynamic 3D scene representation framework that uses predictive coding and anchor-based spatial grouping to compress time-varying scenes.
It employs a spatial-temporal prediction module with anchor points and deformation fields to achieve state-of-the-art reconstruction quality and high compression ratios.
Adaptive quantization and context-based entropy coding ensure efficient storage (~1MB) and real-time rendering speeds (>260 FPS) on modern GPUs.

P-4DGS (Predictive 4D Gaussian Splatting) is a dynamic 3D scene representation and compression framework designed to address the excessive memory and storage requirements of existing 4D Gaussian Splatting (4DGS) methods for dynamic and time-varying scenes. It introduces predictive, video-inspired coding strategies—specifically intra- and inter-frame prediction—by leveraging spatial anchor-based grouping, temporal deformation modeling, adaptive quantization, and context-based entropy coding. P-4DGS achieves state-of-the-art reconstruction quality and the fastest rendering speeds among dynamic 3DGS representations, while drastically reducing the required storage footprint to around 1MB on average and achieving up to 90× compression on real-world datasets (Wang et al., 11 Oct 2025).

1. Motivation and Conceptual Foundations

Existing 4DGS approaches provide photorealistic and real-time rendering for dynamic 3D scene reconstruction but are hindered by substantial temporal and spatial redundancies that render them impractical for storage and deployment. The foundational insight of P-4DGS is to adapt intra/inter prediction from traditional video compression paradigms to 4DGS, exploiting spatial and temporal coherence to remove redundancies among 3D Gaussian primitives over space and time. The result is a compact dynamic 3DGS representation that preserves high-fidelity rendering yet occupies orders of magnitude less storage.

2. Spatial-Temporal Prediction Module

The spatial-temporal prediction module of P-4DGS is composed of two principal components:

a. Spatial Prediction via Anchor Points

Rather than encoding each Gaussian primitive independently, P-4DGS groups local Gaussians under canonical 3D anchor points. Each anchor carries:

Position $x_a \in \mathbb{R}^3$
Scale $s_a \in \mathbb{R}^3$
Offset scaling $l_a \in \mathbb{R}^3$
Learnable offsets $O_a \in \mathbb{R}^{k \times 3}$ (for $k$ associated Gaussians)
Feature vector $f_a \in \mathbb{R}^d$

For each anchor, the positions of associated Gaussians are generated as:

$x_i = x_a + O_i \cdot l_a, \quad i = 0, \ldots, k-1$

MLPs predict opacity, color, rotation, and residual scale for each primitive based on anchor features, relative camera distance, and normalized viewing direction.

b. Temporal Prediction via Deformation Field

To model scene dynamics, a deformation MLP is applied to each canonical Gaussian’s spatial-temporal embedding:

$(\Delta x, \Delta s, \Delta r) = \psi_d([{ \mathcal{E}(x), \mathcal{E}(t) }])$

where $\mathcal{E}(\cdot)$ is a positional encoding. These deformations are applied additively:

$(x', s', r') = (x + \Delta x,\, s + \Delta s,\, r + \Delta r)$

This structure enables the system to exploit spatial and temporal correlations, generating dynamic Gaussian parameters for every frame from compressed canonical representations.

3. Adaptive Quantization and Entropy Coding

To substantially reduce the model size post-training, P-4DGS applies:

a. Adaptive Quantization

All anchor attributes (features, scales, offsets) are quantized. During training, uniform noise is injected to emulate quantization error and facilitate robustness. The quantization step size $q$ is hashed from spatial context features $h$ , queried from a binary hash grid:

$q = Q_0 \cdot (1 + \tanh(\psi_q(h)))$

This allows adaptive precision in quantization according to local variability, preserving quality under high compression.

b. Context-Based Entropy Coding

Once quantized, context-based entropy models estimate coding cost. Each quantized anchor feature $\tilde{f}_a$ is entropy coded using a Gaussian prior:

$p(\tilde{f}_a) = \Phi_{\mu, \sigma}(\tilde{f}_a + \frac{1}{2}q) - \Phi_{\mu, \sigma}(\tilde{f}_a - \frac{1}{2}q)$

where $\mu$ and $\sigma$ are predicted by an MLP conditioned on $h$ . Bit allocation is thus jointly optimized with rendering fidelity in the loss function via a rate-distortion term weighted by $\lambda_{rate}$ .

4. Experimental Performance

Extensive experiments on both synthetic (D-NeRF) and real-world (NeRF-DS) datasets demonstrate:

High compression ratios: ~40× on synthetic data, up to 90× on real-world data
Small storage footprint: Approximately 1 MB per dynamic scene
Realtime rendering: >260 FPS on NVIDIA RTX 4090
Superior rate-distortion tradeoff: Higher PSNR, SSIM, and lower LPIPS than competitive dynamic 3DGS/4DGS baselines across all bitrates
Ablation results: Each core module—anchor-based spatial prediction, compact deformation MLP, and adaptive quantization—contributes significantly to rate-distortion performance and storage reduction

A summary of the comparative performance:

Dataset	Compression Ratio	Avg. Storage	Rendering Speed	Quality Advantage
D-NeRF (syn)	~40×	~1 MB	High (260+ FPS)	Comparable/superior PSNR/SSIM
NeRF-DS (real-world)	~90×	~1 MB	High	Best rate-distortion metrics

5. Applications and Implications

P-4DGS is directly applicable to domains requiring efficient and high-fidelity dynamic scene representation, such as:

Augmented and virtual reality (AR/VR): Enables compact and fast streaming or deployment of dynamic scenes on mobile or resource-constrained headsets.
Robotics and autonomous systems: Facilitates efficient, updateable 3D scene mapping with temporal coherence.
Digital heritage: Permits interaction with animated digital twins at minimal storage cost.
Telepresence/video streaming: Allows transmission of complex dynamic 3D geometries under severe bandwidth constraints.

This suggests that P-4DGS could become foundational wherever both dynamic fidelity and storage/computational efficiency are at a premium.

6. Limitations and Future Research Directions

While P-4DGS achieves substantial compression and speed benefits, the architecture imposes a fixed overhead for the learned deformation MLP, which cannot be further scaled down at extremely low bitrates. This sets a minimum storage threshold and limits adaptability for ultra-low-rate scenarios. Plausible directions for future work include:

Further miniaturization or learnability of temporal representation modules, possibly via alternative coding paradigms or structured pruning.
Development of advanced entropy coding or context modeling to extract residual redundancy from anchor-based representations.
Integration with hardware inference acceleration techniques for even greater real-time speed and mobile viability.
Exploration of adaptive, frame-wise code allocation strategies for scene-acuity-focused compression.

7. Summary

P-4DGS constitutes a significant advance in predictive, highly compressed 4D Gaussian Splatting. By integrating 3D anchor-based spatial prediction, temporal deformation coding, adaptive quantization, and context-aware entropy modeling, P-4DGS achieves robust, high-fidelity dynamic scene reconstruction and real-time rendering at up to 90× compression on real-world data. It provides a compelling solution for large-scale dynamic scene representation, with clear impact for real-time graphics, AR/VR deployment, robotic vision, and transmission of dynamic 3D environments (Wang et al., 11 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

P-4DGS: Predictive 4D Gaussian Splatting with 90$\times$ Compression (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to P-4DGS.