Papers
Topics
Authors
Recent
2000 character limit reached

TED-4DGS: Dynamic 4D Gaussian Splatting

Updated 12 December 2025
  • TED-4DGS is a dynamic 4D compression scheme that extends anchor-based 3D Gaussian Splatting with temporal activation to model real-world dynamic scenes.
  • It integrates learnable temporal gating and a shared deformation bank to modulate Gaussian primitives, ensuring smooth transitions and enhanced feature fidelity.
  • Its combination of an INR hyperprior and autoregressive coding achieves significant bitrate savings (up to 63%) while maintaining competitive PSNR on benchmark datasets.

TED-4DGS, or Temporally Activated and Embedding-based Deformation for 4DGS Compression, is a dynamic extension and compression scheme for 4D Gaussian Splatting (4DGS) representations in dynamic scene modeling. TED-4DGS addresses the challenge of building compact and temporally controllable representations of dynamic 3D scenes driven by real video data, unifying advantages of canonical-anchor-based and explicit space-time 4DGS paradigms, and is designed for rate–distortion (R–D)-optimized compression on real-world dynamic scene benchmarks (Ho et al., 5 Dec 2025).

1. Anchor-Based 3DGS Foundation and Dynamic Extension

TED-4DGS builds upon ScaffoldGS, a static anchor-based 3D Gaussian Splatting (3DGS) model. ScaffoldGS uses a sparse set of anchor points {xa}\{x_a\} placed on a 3D grid, with each anchor aa associated with a feature vector faRFf_a\in\mathbb{R}^F that parameterizes KK local Gaussian primitives. An MLP decoder predicts offsets {Oa,i}i=0K1\{O_{a,i}\}_{i=0}^{K-1} and scale vector lal_a so that each Gaussian mean μa,i=xa+laOa,i\mu_{a,i}=x_a+l_a\odot O_{a,i}. Additional MLP outputs parameterize per-Gaussian scale sa,is_{a,i}, rotation ra,ir_{a,i}, color ca,ic_{a,i}, and opacity αa,i\alpha_{a,i}. Rendering involves alpha compositing of these Gaussians in canonical space.

TED-4DGS extends this formulation to the dynamic 4D setting by:

  • Injecting per-anchor, learnable temporal-activation parameters τa(t)\tau_a(t) for gating appearance/disappearance over time.
  • Introducing per-anchor low-dimensional temporal embeddings ϕa\phi_a, mapped via a shared global "deformation bank" ZZ to produce anchor-specific deformation fields Δxa(t)\Delta x_a(t) and Δfa(t)\Delta f_a(t).
  • Integrating an implicit neural representation (INR)-based hyperprior and a channel-wise autoregressive model for entropy-aware, rate–distortion-optimized attribute compression.

2. Temporal Activation Mechanism

To enable explicit, learnable control over dynamic object occlusion and disocclusion, TED-4DGS introduces a temporal-activation function for each anchor:

  • Each anchor aa learns four scalars (as,bs;af,bf)(a_s, b_s; a_f, b_f), where as,af[0,T]a_s, a_f\in[0,T] mark "soft" start and end frames and bs,bfb_s, b_f control activation/deactivation smoothness.
  • The temporal activation is defined as:

τa(t)={exp[((tas)/bs)2],t<as 1,astaf exp[((taf)/bf)2],t>af\tau_a(t) = \begin{cases} \exp[-((t-a_s)/b_s)^2], & t<a_s \ 1, & a_s \leq t \leq a_f \ \exp[-((t-a_f)/b_f)^2], & t>a_f \end{cases}

  • At render time, anchor opacities are modulated as αa,i(t)=αa,iτa(t)\alpha_{a,i}(t) = \alpha_{a,i} \cdot \tau_a(t). This construction allows Gaussians to fade in and out without resorting to spatial deformation for invisibility, sharply reducing parameter count and improving temporal realism.

3. Embedding-Based Deformation via Shared Deformation Bank

TED-4DGS represents nonrigid per-anchor, per-time motion with a lightweight embedding mechanism:

  • A global deformation bank ZRF/2×DZ\in\mathbb{R}^{F/2\times D} (one DD-vector per two frames).
  • Each anchor has a temporal embedding ϕaRd\phi_a\in\mathbb{R}^d. At time tt, linear interpolation in ZZ yields z(t)RDz^{(t)} \in \mathbb{R}^D. Simple MLP FprojF_{proj} maps ϕa\phi_a to waRDw_a\in\mathbb{R}^D; then waz(t)w_a\odot z^{(t)} parametrizes anchor-specific deformation.
  • A compact MLP FdeformF_{deform} maps the elementwise product to produce (Δfa(t),Δxa(t))(\Delta f_a(t), \Delta x_a(t)):

(Δfa,Δxa)=Fdeform(waz(t))(\Delta f_a, \Delta x_a) = F_{deform}(w_a\odot z^{(t)})

  • Anchor feature and position are time-modulated as fa(t)=fa+Δfa(t), xa(t)=xa+Δxa(t)f_a(t) = f_a + \Delta f_a(t),\ x_a(t) = x_a + \Delta x_a(t).

Empirically, this multiplicative query design yields \sim0.9 dB PSNR gain over concatenation at equivalent size (Ho et al., 5 Dec 2025).

4. Rate–Distortion-Optimized Compression Framework

TED-4DGS aims for joint optimization of perceptual quality and bitrate by:

  • Minimizing:

L=D+λR+λoffsetLoffset+λtempLtemp+λvolLvol+λtvLtv\mathcal{L} = D + \lambda R + \lambda_{offset} L_{offset} + \lambda_{temp} L_{temp} + \lambda_{vol} L_{vol} + \lambda_{tv} L_{tv}

where DD is the sum of L1 and (1–SSIM) losses over renderings, RR is average bits-per-anchor measured via the entropy model, Loffset,LtempL_{offset},L_{temp} are sparsity regularizers for pruning, LvolL_{vol} promotes scale consistency, LtvL_{tv} regularizes the temporal deformation bank.

  • An INR hyperprior models each anchor attribute’s probability as a Gaussian with analytically computed mean μh\mu_h and std σh\sigma_h, both being outputs of MLP HH applied to sinusoidal positional encodings γ(xa)\gamma(x_a). Each quantized attribute a^\hat{a} is assigned:

p(a^xa)=a^q/2a^+q/2N(μh,σh)dap(\hat{a}|x_a) = \int_{\hat{a}-q/2}^{\hat{a}+q/2} \mathcal{N}(\mu_h, \sigma_h) da

  • Channel-wise autoregressive modeling of anchor feature vectors faf_a, with each channel fa,cf_{a,c} decoded conditioned on prior channels using a masked MLP.
  • Quantization step qq is learnable per attribute.
  • Arithmetic coding is performed over the learned probabilistic model; model hyperparameters and MLP weights are transmitted once.

Ablations show that the INR hyperprior achieves a 20% BD-Rate saving over a factorizable prior, outperforming triplane/hashing hyperpriors by \sim12%.

5. Training Pipeline and Implementation

The TED-4DGS compression workflow involves:

  • Uniform quantization of each Gaussian attribute with a learned quantization step.
  • Entropy coding using arithmetic coders against the INR hyperprior/auto-regressive model.
  • Anchor coordinates stored in FP16, neural network weights and deformation bank in FP32.
  • Progressive training with a 20k-iteration delay before learning (as,af)(a_s, a_f), which stabilizes mask pruning and deformation learning; removing this delay causes \sim20% higher bitrates.

Typical training converges after \sim1M iterations per scene (∼12 hours, PyTorch on RTX 3090). Rendering achieves \sim70 fps for 536×960536\times960 resolution in novel-view synthesis (Ho et al., 5 Dec 2025).

6. Empirical Results and Comparisons

TED-4DGS is evaluated on Neu3D and HyperNeRF dynamic benchmarks, outperforming Light4GS and ADC-GS in rate–distortion metrics. For example:

Method PSNR (Neu3D) Size (MB, Neu3D) PSNR (HyperNeRF) Size (MB, HyperNeRF)
Light4GS (low) 31.48 3.77 25.35 5.15
ADC-GS (low) 31.41 4.04 25.42 4.02
TED-4DGS (low) 31.63 1.73 25.22 2.36
Light4GS (high) 31.69 5.46 25.55 8.87
ADC-GS (high) 31.67 6.57 25.68 6.67
TED-4DGS (high) 32.25 2.26 25.67 3.72

TED-4DGS achieves up to –63% average bitrate savings on Neu3D and –45% on HyperNeRF relative to ADC-GS at matched PSNR. Rate–distortion performance strictly dominates prior art over all tested operating points.

7. Ablation Studies and Implementation Nuances

Key findings from ablation and implementation studies:

  • Temporal-activation module substantially improves compression, reducing size by ≈9% and avoiding unnatural anchor deformations.
  • Use of multiplicative deformation queries is measurably superior to concatenation.
  • Color correction via a per-camera MLP mitigates cross-view color bias, improving PSNR by ∼0.3 dB.
  • For slow-motion scenes, most anchors have long-lived activation (Δτ>0.8\Delta\tau>0.8 for 97%); fast-motion scenes see a larger fraction of short-lived anchors (18% with Δτ<0.2\Delta\tau<0.2).
  • Extremely large feature dimensions FF or very fast motions marginally inflate the size of the deformation bank ZZ, but F/2F/2 bank entries suffice for most sequences.
  • Currently, the number of Gaussians per anchor (K=8K=8) is fixed; adapting KK per-anchor would require additional signaling infrastructure in the compressed stream.

TED-4DGS introduces a temporally activated extension to sparse anchor-based 3DGS, a compact per-anchor dynamic deformation mechanism, and the first INR+autoregressive coding strategy for true rate–distortion-optimized dynamic 4DGS compression. Its combination of temporal gating, shared low-rank deformation, and targeted entropy modeling yields leading compression rates with high fidelity on challenging real-world dynamic scene data (Ho et al., 5 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to TED-4DGS.