Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wasserstein-Constrained 4D Gaussian Splatting

Updated 17 June 2026
  • The paper demonstrates a dynamic state-space modeling framework using neural 4DGS with Wasserstein regularization to ensure smooth and coherent Gaussian trajectories.
  • It integrates a Kalman-inspired fusion filter that combines neural predictions with dynamical priors to suppress flicker and improve optical-flow consistency.
  • Empirical evaluations indicate significant PSNR improvements and reduced training time, validating its effectiveness for photorealistic dynamic scene rendering.

Wasserstein-Constrained 4DGS (Four-Dimensional Gaussian Splatting) is a methodology for dynamic scene rendering that jointly models the smooth translation and deformation of 3D Gaussian primitives over time. By embedding state-space modeling within a neural 4DGS pipeline and regularizing trajectories via the 2-Wasserstein distance, this approach enforces temporal coherence, physical plausibility, and efficient optimization for photorealistic, temporally consistent multi-frame rendering of dynamic scenes (Deng et al., 2024).

1. Dynamic 4DGS Pipeline and Problem Formulation

The core pipeline ingests a sparse point cloud generated by Structure-from-Motion (SfM), from which it parameterizes a set of canonical 3D Gaussians. Each Gaussian primitive ii is described by a mean μc(i)R3\mu^{c(i)} \in \mathbb{R}^3, rotation Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3), and scale Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}. The covariance is Σc(i)=Rc(i)Sc(i)Sc(i)TRc(i)T\Sigma^{c(i)} = R^{c(i)} S^{c(i)} S^{c(i)\,T} R^{c(i)\,T}. For each time step tt, a deformation network fθf_\theta—parameterized as an MLP—predicts a time-dependent “observation” Gaussian

NtOb(i)=fθ(Nc(i),t)=(μtOb(i),ΣtOb(i)).\mathcal{N}_t^{\mathrm{Ob}(i)} = f_\theta\left(\mathcal{N}^{c(i)},\,t\right) = \left(\mu_t^{\mathrm{Ob}(i)},\,\Sigma_t^{\mathrm{Ob}(i)}\right).

A state-space model—including predictor and filter—merges network predictions with a dynamical prior to promote smooth and physically plausible temporal trajectories. The resulting filtered Gaussians N^t(i)\hat{\mathcal{N}}_t^{(i)} are rendered by a differentiable splatting scheme to generate each RGB frame at time tt.

Key methodological challenges identified include:

  • Suppression of abrupt jumps or flicker in means μc(i)R3\mu^{c(i)} \in \mathbb{R}^30 and covariances μc(i)R3\mu^{c(i)} \in \mathbb{R}^31 across frames.
  • Unified modeling of both translation and shape deformation with a geometrically faithful metric.
  • Realtime or near-realtime temporal optimization within high-dimensional 4DGS systems (Deng et al., 2024).

2. State-Space Modeling for Temporal Coherence

Each dynamic Gaussian’s state at time μc(i)R3\mu^{c(i)} \in \mathbb{R}^32 is μc(i)R3\mu^{c(i)} \in \mathbb{R}^33 (as only the symmetric degrees of μc(i)R3\mu^{c(i)} \in \mathbb{R}^34 are stored). The transition (predictor) step can follow either a simple Euclidean rule or optimal transport geometry:

  • Euclidean baseline:

μc(i)R3\mu^{c(i)} \in \mathbb{R}^35

  • Wasserstein-geometry version:

μc(i)R3\mu^{c(i)} \in \mathbb{R}^36

Observations at each time point are provided by the deformation network as noisy measurements of the ground-truth state.

This state-space abstraction enables the filtering and regularization of temporal Gaussian trajectories in a mathematically structured manner, supporting the integration of both neural observations and a physically inspired dynamical prior (Deng et al., 2024).

3. State Consistency Filtering via Kalman-like Fusion

Temporal consistency is enforced through a Kalman filter-inspired state update combining the prior (predictive) state μc(i)R3\mu^{c(i)} \in \mathbb{R}^37 and network observation μc(i)R3\mu^{c(i)} \in \mathbb{R}^38:

μc(i)R3\mu^{c(i)} \in \mathbb{R}^39

Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)0

This filter update suppresses erratic MLP-driven changes (“flicker”) by optimally weighting neural predictions against the dynamical prior. A plausible implication is improved stability in the resulting Gaussian trajectories and reduced image-space optical-flow artifacts.

4. Wasserstein Geometry and Regularization

The framework leverages Wasserstein geometry—specifically the 2-Wasserstein distance—to regularize both state estimation and trajectory evolution. For Gaussians Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)1, Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)2, the 2-Wasserstein metric is:

Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)3

In this model, Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)4 is decomposed as Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)5 and the trace term is computed efficiently to avoid redundancy. Regularization losses include:

  • State-Observation Alignment:

Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)6

  • Temporal Smoothness:

Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)7

The total loss is formulated as

Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)8

where Rc(i)SO(3)R^{c(i)} \in \mathrm{SO}(3)9 is the standard photometric error.

Logarithmic and exponential maps in the space of symmetric positive definite (SPD) matrices ensure that covariance updates remain SPD and follow geodesic paths, thereby preserving physical plausibility during rapid or deformable motions (Deng et al., 2024).

5. End-to-end Algorithmic Workflow

The following pseudocode exemplifies the update routine per Gaussian and time step: Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}6 This joint optimization ensures end-to-end differentiability across geometric, temporal, and rendering domains, permitting seamless integration of neural modeling and physical regularization (Deng et al., 2024).

6. Empirical Evaluation and Performance

Experiments utilize both synthetic (D-NeRF: moving digits, animated characters, Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}0 resolution) and real-world (Plenoptic Video: people, objects, Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}1) dynamic scene datasets. The method was implemented on an NVIDIA A800 GPU with the Adam optimizer, Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}2k training iterations (initial Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}3k for static initialization), filter enabled from Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}4k, and Wasserstein regularization activated from Sc(i)R3×3S^{c(i)} \in \mathbb{R}^{3 \times 3}5k iterations onwards.

The evaluation metrics include PSNR (peak signal-to-noise ratio), SSIM, LPIPS, runtime FPS, and total training time.

Dataset Method PSNR (dB) SSIM LPIPS FPS
D-NeRF 4D-GS baseline 31.8 0.958 0.032 87
Ours 34.45 0.970 0.026 45.5
Plenoptic 4D-GS baseline 29.91 0.928 0.168 76
Ours 31.62 0.940 0.140 37

Ablation results detail the contribution of individual innovations:

  • Adding the state consistency filter yields a +0.80 dB PSNR increase and optical-flow AEPE reduced from 1.45 to 1.02.
  • Wasserstein regularization, compared to linear regularization, provides +1.0 dB PSNR and a 57% reduction in training time.
  • Full Wasserstein geometry (log/exp maps on SPD) further increases PSNR by +0.50 dB. Combined, these components provide +2.0 dB PSNR over observation-only baselines and outperform previous dynamic scene rendering methods on all reported datasets (Deng et al., 2024).

7. Analysis: Motion Smoothness, Plausibility, and Implications

The fusion filter (Kalman-style) reduces network-induced noise, yielding improved optical-flow consistency. Wasserstein regularization jointly constrains both mean and covariance, damping flicker while accommodating plausible object deformation. Manifold-structured log/exp updates guarantee covariances remain SPD and follow geodesics, beneficial for rapid or complex shape variation.

This methodology embeds 3D Gaussian trajectory estimation into a state-space model guided by optimal-transport geometry. Empirically, this enables smoother transitions, reduced artifacts, and enhanced temporal coherence compared to naive neural deformation or simple regularization. A plausible implication is that Wasserstein-constrained 4DGS yields physically plausible, high-fidelity dynamic scene renderings, supporting scalable real-world deployment in dynamic view synthesis, neural graphics, and temporally adaptive reconstruction pipelines (Deng et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein-Constrained 4DGS.