TED-4DGS: Dynamic 4D Gaussian Splatting
- TED-4DGS is a dynamic 4D compression scheme that extends anchor-based 3D Gaussian Splatting with temporal activation to model real-world dynamic scenes.
- It integrates learnable temporal gating and a shared deformation bank to modulate Gaussian primitives, ensuring smooth transitions and enhanced feature fidelity.
- Its combination of an INR hyperprior and autoregressive coding achieves significant bitrate savings (up to 63%) while maintaining competitive PSNR on benchmark datasets.
TED-4DGS, or Temporally Activated and Embedding-based Deformation for 4DGS Compression, is a dynamic extension and compression scheme for 4D Gaussian Splatting (4DGS) representations in dynamic scene modeling. TED-4DGS addresses the challenge of building compact and temporally controllable representations of dynamic 3D scenes driven by real video data, unifying advantages of canonical-anchor-based and explicit space-time 4DGS paradigms, and is designed for rate–distortion (R–D)-optimized compression on real-world dynamic scene benchmarks (Ho et al., 5 Dec 2025).
1. Anchor-Based 3DGS Foundation and Dynamic Extension
TED-4DGS builds upon ScaffoldGS, a static anchor-based 3D Gaussian Splatting (3DGS) model. ScaffoldGS uses a sparse set of anchor points placed on a 3D grid, with each anchor associated with a feature vector that parameterizes local Gaussian primitives. An MLP decoder predicts offsets and scale vector so that each Gaussian mean . Additional MLP outputs parameterize per-Gaussian scale , rotation , color , and opacity . Rendering involves alpha compositing of these Gaussians in canonical space.
TED-4DGS extends this formulation to the dynamic 4D setting by:
- Injecting per-anchor, learnable temporal-activation parameters for gating appearance/disappearance over time.
- Introducing per-anchor low-dimensional temporal embeddings , mapped via a shared global "deformation bank" to produce anchor-specific deformation fields and .
- Integrating an implicit neural representation (INR)-based hyperprior and a channel-wise autoregressive model for entropy-aware, rate–distortion-optimized attribute compression.
2. Temporal Activation Mechanism
To enable explicit, learnable control over dynamic object occlusion and disocclusion, TED-4DGS introduces a temporal-activation function for each anchor:
- Each anchor learns four scalars , where mark "soft" start and end frames and control activation/deactivation smoothness.
- The temporal activation is defined as:
- At render time, anchor opacities are modulated as . This construction allows Gaussians to fade in and out without resorting to spatial deformation for invisibility, sharply reducing parameter count and improving temporal realism.
3. Embedding-Based Deformation via Shared Deformation Bank
TED-4DGS represents nonrigid per-anchor, per-time motion with a lightweight embedding mechanism:
- A global deformation bank (one -vector per two frames).
- Each anchor has a temporal embedding . At time , linear interpolation in yields . Simple MLP maps to ; then parametrizes anchor-specific deformation.
- A compact MLP maps the elementwise product to produce :
- Anchor feature and position are time-modulated as .
Empirically, this multiplicative query design yields 0.9 dB PSNR gain over concatenation at equivalent size (Ho et al., 5 Dec 2025).
4. Rate–Distortion-Optimized Compression Framework
TED-4DGS aims for joint optimization of perceptual quality and bitrate by:
- Minimizing:
where is the sum of L1 and (1–SSIM) losses over renderings, is average bits-per-anchor measured via the entropy model, are sparsity regularizers for pruning, promotes scale consistency, regularizes the temporal deformation bank.
- An INR hyperprior models each anchor attribute’s probability as a Gaussian with analytically computed mean and std , both being outputs of MLP applied to sinusoidal positional encodings . Each quantized attribute is assigned:
- Channel-wise autoregressive modeling of anchor feature vectors , with each channel decoded conditioned on prior channels using a masked MLP.
- Quantization step is learnable per attribute.
- Arithmetic coding is performed over the learned probabilistic model; model hyperparameters and MLP weights are transmitted once.
Ablations show that the INR hyperprior achieves a 20% BD-Rate saving over a factorizable prior, outperforming triplane/hashing hyperpriors by 12%.
5. Training Pipeline and Implementation
The TED-4DGS compression workflow involves:
- Uniform quantization of each Gaussian attribute with a learned quantization step.
- Entropy coding using arithmetic coders against the INR hyperprior/auto-regressive model.
- Anchor coordinates stored in FP16, neural network weights and deformation bank in FP32.
- Progressive training with a 20k-iteration delay before learning , which stabilizes mask pruning and deformation learning; removing this delay causes 20% higher bitrates.
Typical training converges after 1M iterations per scene (∼12 hours, PyTorch on RTX 3090). Rendering achieves 70 fps for resolution in novel-view synthesis (Ho et al., 5 Dec 2025).
6. Empirical Results and Comparisons
TED-4DGS is evaluated on Neu3D and HyperNeRF dynamic benchmarks, outperforming Light4GS and ADC-GS in rate–distortion metrics. For example:
| Method | PSNR (Neu3D) | Size (MB, Neu3D) | PSNR (HyperNeRF) | Size (MB, HyperNeRF) |
|---|---|---|---|---|
| Light4GS (low) | 31.48 | 3.77 | 25.35 | 5.15 |
| ADC-GS (low) | 31.41 | 4.04 | 25.42 | 4.02 |
| TED-4DGS (low) | 31.63 | 1.73 | 25.22 | 2.36 |
| Light4GS (high) | 31.69 | 5.46 | 25.55 | 8.87 |
| ADC-GS (high) | 31.67 | 6.57 | 25.68 | 6.67 |
| TED-4DGS (high) | 32.25 | 2.26 | 25.67 | 3.72 |
TED-4DGS achieves up to –63% average bitrate savings on Neu3D and –45% on HyperNeRF relative to ADC-GS at matched PSNR. Rate–distortion performance strictly dominates prior art over all tested operating points.
7. Ablation Studies and Implementation Nuances
Key findings from ablation and implementation studies:
- Temporal-activation module substantially improves compression, reducing size by ≈9% and avoiding unnatural anchor deformations.
- Use of multiplicative deformation queries is measurably superior to concatenation.
- Color correction via a per-camera MLP mitigates cross-view color bias, improving PSNR by ∼0.3 dB.
- For slow-motion scenes, most anchors have long-lived activation ( for 97%); fast-motion scenes see a larger fraction of short-lived anchors (18% with ).
- Extremely large feature dimensions or very fast motions marginally inflate the size of the deformation bank , but bank entries suffice for most sequences.
- Currently, the number of Gaussians per anchor () is fixed; adapting per-anchor would require additional signaling infrastructure in the compressed stream.
TED-4DGS introduces a temporally activated extension to sparse anchor-based 3DGS, a compact per-anchor dynamic deformation mechanism, and the first INR+autoregressive coding strategy for true rate–distortion-optimized dynamic 4DGS compression. Its combination of temporal gating, shared low-rank deformation, and targeted entropy modeling yields leading compression rates with high fidelity on challenging real-world dynamic scene data (Ho et al., 5 Dec 2025).