Wasserstein-Constrained 4D Gaussian Splatting

Updated 17 June 2026

The paper demonstrates a dynamic state-space modeling framework using neural 4DGS with Wasserstein regularization to ensure smooth and coherent Gaussian trajectories.
It integrates a Kalman-inspired fusion filter that combines neural predictions with dynamical priors to suppress flicker and improve optical-flow consistency.
Empirical evaluations indicate significant PSNR improvements and reduced training time, validating its effectiveness for photorealistic dynamic scene rendering.

Wasserstein-Constrained 4DGS (Four-Dimensional Gaussian Splatting) is a methodology for dynamic scene rendering that jointly models the smooth translation and deformation of 3D Gaussian primitives over time. By embedding state-space modeling within a neural 4DGS pipeline and regularizing trajectories via the 2-Wasserstein distance, this approach enforces temporal coherence, physical plausibility, and efficient optimization for photorealistic, temporally consistent multi-frame rendering of dynamic scenes (Deng et al., 2024).

1. Dynamic 4DGS Pipeline and Problem Formulation

The core pipeline ingests a sparse point cloud generated by Structure-from-Motion (SfM), from which it parameterizes a set of canonical 3D Gaussians. Each Gaussian primitive $i$ is described by a mean $\mu^{c(i)} \in \mathbb{R}^3$ , rotation $R^{c(i)} \in \mathrm{SO}(3)$ , and scale $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ . The covariance is $\Sigma^{c(i)} = R^{c(i)} S^{c(i)} S^{c(i)\,T} R^{c(i)\,T}$ . For each time step $t$ , a deformation network $f_\theta$ —parameterized as an MLP—predicts a time-dependent “observation” Gaussian

$\mathcal{N}_t^{\mathrm{Ob}(i)} = f_\theta\left(\mathcal{N}^{c(i)},\,t\right) = \left(\mu_t^{\mathrm{Ob}(i)},\,\Sigma_t^{\mathrm{Ob}(i)}\right).$

A state-space model—including predictor and filter—merges network predictions with a dynamical prior to promote smooth and physically plausible temporal trajectories. The resulting filtered Gaussians $\hat{\mathcal{N}}_t^{(i)}$ are rendered by a differentiable splatting scheme to generate each RGB frame at time $t$ .

Key methodological challenges identified include:

Suppression of abrupt jumps or flicker in means $\mu^{c(i)} \in \mathbb{R}^3$ 0 and covariances $\mu^{c(i)} \in \mathbb{R}^3$ 1 across frames.
Unified modeling of both translation and shape deformation with a geometrically faithful metric.
Realtime or near-realtime temporal optimization within high-dimensional 4DGS systems (Deng et al., 2024).

2. State-Space Modeling for Temporal Coherence

Each dynamic Gaussian’s state at time $\mu^{c(i)} \in \mathbb{R}^3$ 2 is $\mu^{c(i)} \in \mathbb{R}^3$ 3 (as only the symmetric degrees of $\mu^{c(i)} \in \mathbb{R}^3$ 4 are stored). The transition (predictor) step can follow either a simple Euclidean rule or optimal transport geometry:

Euclidean baseline:

$\mu^{c(i)} \in \mathbb{R}^3$ 5

Wasserstein-geometry version:

$\mu^{c(i)} \in \mathbb{R}^3$ 6

Observations at each time point are provided by the deformation network as noisy measurements of the ground-truth state.

This state-space abstraction enables the filtering and regularization of temporal Gaussian trajectories in a mathematically structured manner, supporting the integration of both neural observations and a physically inspired dynamical prior (Deng et al., 2024).

3. State Consistency Filtering via Kalman-like Fusion

Temporal consistency is enforced through a Kalman filter-inspired state update combining the prior (predictive) state $\mu^{c(i)} \in \mathbb{R}^3$ 7 and network observation $\mu^{c(i)} \in \mathbb{R}^3$ 8:

$\mu^{c(i)} \in \mathbb{R}^3$ 9

$R^{c(i)} \in \mathrm{SO}(3)$ 0

This filter update suppresses erratic MLP-driven changes (“flicker”) by optimally weighting neural predictions against the dynamical prior. A plausible implication is improved stability in the resulting Gaussian trajectories and reduced image-space optical-flow artifacts.

4. Wasserstein Geometry and Regularization

The framework leverages Wasserstein geometry—specifically the 2-Wasserstein distance—to regularize both state estimation and trajectory evolution. For Gaussians $R^{c(i)} \in \mathrm{SO}(3)$ 1, $R^{c(i)} \in \mathrm{SO}(3)$ 2, the 2-Wasserstein metric is:

$R^{c(i)} \in \mathrm{SO}(3)$ 3

In this model, $R^{c(i)} \in \mathrm{SO}(3)$ 4 is decomposed as $R^{c(i)} \in \mathrm{SO}(3)$ 5 and the trace term is computed efficiently to avoid redundancy. Regularization losses include:

State-Observation Alignment:

$R^{c(i)} \in \mathrm{SO}(3)$ 6

Temporal Smoothness:

$R^{c(i)} \in \mathrm{SO}(3)$ 7

The total loss is formulated as

$R^{c(i)} \in \mathrm{SO}(3)$ 8

where $R^{c(i)} \in \mathrm{SO}(3)$ 9 is the standard photometric error.

Logarithmic and exponential maps in the space of symmetric positive definite (SPD) matrices ensure that covariance updates remain SPD and follow geodesic paths, thereby preserving physical plausibility during rapid or deformable motions (Deng et al., 2024).

5. End-to-end Algorithmic Workflow

The following pseudocode exemplifies the update routine per Gaussian and time step: $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 6 This joint optimization ensures end-to-end differentiability across geometric, temporal, and rendering domains, permitting seamless integration of neural modeling and physical regularization (Deng et al., 2024).

6. Empirical Evaluation and Performance

Experiments utilize both synthetic (D-NeRF: moving digits, animated characters, $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 0 resolution) and real-world (Plenoptic Video: people, objects, $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 1) dynamic scene datasets. The method was implemented on an NVIDIA A800 GPU with the Adam optimizer, $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 2k training iterations (initial $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 3k for static initialization), filter enabled from $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 4k, and Wasserstein regularization activated from $S^{c(i)} \in \mathbb{R}^{3 \times 3}$ 5k iterations onwards.

The evaluation metrics include PSNR (peak signal-to-noise ratio), SSIM, LPIPS, runtime FPS, and total training time.

Dataset	Method	PSNR (dB)	SSIM	LPIPS	FPS
D-NeRF	4D-GS baseline	31.8	0.958	0.032	87
	Ours	34.45	0.970	0.026	45.5
Plenoptic	4D-GS baseline	29.91	0.928	0.168	76
	Ours	31.62	0.940	0.140	37

Ablation results detail the contribution of individual innovations:

Adding the state consistency filter yields a +0.80 dB PSNR increase and optical-flow AEPE reduced from 1.45 to 1.02.
Wasserstein regularization, compared to linear regularization, provides +1.0 dB PSNR and a 57% reduction in training time.
Full Wasserstein geometry (log/exp maps on SPD) further increases PSNR by +0.50 dB. Combined, these components provide +2.0 dB PSNR over observation-only baselines and outperform previous dynamic scene rendering methods on all reported datasets (Deng et al., 2024).

7. Analysis: Motion Smoothness, Plausibility, and Implications

The fusion filter (Kalman-style) reduces network-induced noise, yielding improved optical-flow consistency. Wasserstein regularization jointly constrains both mean and covariance, damping flicker while accommodating plausible object deformation. Manifold-structured log/exp updates guarantee covariances remain SPD and follow geodesics, beneficial for rapid or complex shape variation.

This methodology embeds 3D Gaussian trajectory estimation into a state-space model guided by optimal-transport geometry. Empirically, this enables smoother transitions, reduced artifacts, and enhanced temporal coherence compared to naive neural deformation or simple regularization. A plausible implication is that Wasserstein-constrained 4DGS yields physically plausible, high-fidelity dynamic scene renderings, supporting scalable real-world deployment in dynamic view synthesis, neural graphics, and temporally adaptive reconstruction pipelines (Deng et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein-Constrained 4DGS.

Wasserstein-Constrained 4D Gaussian Splatting

1. Dynamic 4DGS Pipeline and Problem Formulation

2. State-Space Modeling for Temporal Coherence

3. State Consistency Filtering via Kalman-like Fusion

4. Wasserstein Geometry and Regularization

5. End-to-end Algorithmic Workflow

6. Empirical Evaluation and Performance

7. Analysis: Motion Smoothness, Plausibility, and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Wasserstein-Constrained 4D Gaussian Splatting

1. Dynamic 4DGS Pipeline and Problem Formulation

2. State-Space Modeling for Temporal Coherence

3. State Consistency Filtering via Kalman-like Fusion

4. Wasserstein Geometry and Regularization

5. End-to-end Algorithmic Workflow

6. Empirical Evaluation and Performance

7. Analysis: Motion Smoothness, Plausibility, and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research