FlowGaussian-VR: Dynamic 3D Gaussian Splatting
- The paper introduces a novel velocity field modeling scheme that integrates explicit optical-flow supervision and adaptive Gaussian densification to enhance dynamic 3D reconstruction.
- It employs a dual rendering pipeline that simultaneously produces color images and dense 2D velocity maps, ensuring accurate motion supervision through combined loss functions.
- Empirical evaluations demonstrate significant PSNR gains and reduced motion blur, resulting in smoother, physically plausible per-Gaussian trajectories.
FlowGaussian-VR is a flow-empowered velocity field modeling scheme for high-fidelity 3D Gaussian splatting-based video reconstruction in dynamic scenes. Developed to address the limitations of deformation-based dynamic Gaussian splatting methods in scenes containing complex motion and significant scale variation, FlowGaussian-VR integrates explicit velocity field modeling, optical-flow-based supervision, and adaptive Gaussian densification. This framework delivers state-of-the-art performance for multi-view dynamic reconstruction and novel view synthesis, yielding sharp dynamic textures, enhanced photometric fidelity, and temporally consistent per-Gaussian trajectories (Li et al., 31 Jul 2025).
1. Velocity Field Modeling in Dynamic Gaussian Splatting
FlowGaussian-VR builds on deformation-based 4D Gaussian Splatting (4DGS) by augmenting each 3D Gaussian with a learnable 2D projected velocity vector in addition to its usual attributes—canonical center , covariance , color , and opacity . The time-dependent position of each Gaussian is defined as , where is a compact neural deformation field. The velocity vector provides the projected per-Gaussian 2D displacement between frames and is directly optimized throughout training.
In contrast to static or basic deformation-only methods, this velocity-centric approach enables direct motion supervision for each Gaussian, allowing more precise control and regularization of the underlying spatial-temporal evolution of dynamic scene elements.
2. Velocity-Field Rendering Pipeline and Loss Formulation
FlowGaussian-VR extends the standard 3DGS rasterization process by simultaneously rendering not only the color image but also a dense 2D velocity (flow) map per frame. Each Gaussian contributes to color and velocity accumulation at image pixel via its alpha-weighted visibility 0, with the projected velocity field computed as
1
Supervision is provided by optical-flow maps 2 obtained from a state-of-the-art 2D estimator (RAFT). The loss scheme combines:
- Windowed Velocity Error 3: Multi-frame flow regression to enforce consistency between predicted and reference optical flows over a window of 4 frames.
- Flow-Warping Error 5: Photometric consistency by warping rendered color images using predicted flow and comparing to the reference.
- Dynamic Region Photometric Loss 6: Emphasizes photometric matching within moving-object regions detected via binary masks (from SAM-v2).
- Static Scene Photometric Loss 7: Global photometric 8 loss.
- Regularization 9: Penalizes large velocities and deformation weights.
The total loss is a weighted sum:
0
Optimization uses Adam with learning rate 1 for 80,000–120,000 iterations, adjusting all Gaussian and deformation network parameters jointly.
3. Flow-Assisted Adaptive Densification (FAD)
Conventional static-scene densification algorithms often undersample regions with large, abrupt motion, leading to insufficient Gaussian coverage in dynamic content. FlowGaussian-VR introduces Flow-Assisted Adaptive Densification (FAD) to address underfitting in such regions. After every fixed number of training iterations, a flow loss map 2 and its gradient are computed, and pixels with large error and substantial local variation are selected. These are lifted to 3D, mapped to the canonical frame, and subsampled. For each candidate, nearest Gaussian neighbors provide attribute interpolation for generating new Gaussians, which are inserted into the model and allowed to acquire full trajectory and velocity attributes through continued optimization.
This strategy targets persistently high-error dynamic regions, producing substantial improvements in reproduction fidelity over static-densification schemes.
4. Training Framework and Schedule
The end-to-end training procedure unfolds in four stages:
- Initialization: COLMAP on the first video frame provides a sparse point cloud, from which Gaussians are initialized. RAFT computes reference optical flows; SAM-v2 produces dynamic masks.
- Warmup Phase (0–20k iterations): Only color channels and deformation field are optimized using photometric loss; velocity vectors are fixed with small 3 regularization.
- Full Optimization (20–80k iterations): All loss terms are activated; velocity vectors, deformation, and Gaussian attributes are jointly optimized.
- Adaptive Densification (every 20k iterations): FAD injects new Gaussians into high-flow-loss regions.
- (Optional) Temporal Velocity Refinement (TVR): An Extended Kalman Filter (EKF) is applied post hoc to regularize per-Gaussian trajectories, using the model’s own velocity field as observation, thereby smoothing temporal paths without modifying appearance parameters.
Each stage ensures both stable convergence and maximally informative spatial-temporal supervision for all scene components.
5. Empirical Performance and Comparison
FlowGaussian-VR achieves superior performance on the challenging Nvidia-long (seven scenes, 90–210 frames, 12 cameras) and Neu3D (six scenes, 300 frames, ~15 cameras) datasets. Quantitative metrics are summarized below:
| Method | Nvidia-long PSNR (DPSNR) | Neu3D PSNR (DPSNR) |
|---|---|---|
| 4DGS | 22.73 (20.00) | 24.85 (23.33) |
| 4D-GS | 23.33 (20.03) | 26.70 (25.96) |
| SC-GS | 13.99 (10.39) | 11.42 (10.01) |
| MotionGS | 19.85 (17.34) | 23.65 (22.42) |
| FlowGaussian-VR | 25.23 (22.40) | 27.30 (26.47) |
Average gains are approximately 2.5 dB in overall PSNR and up to 3.1 dB in dynamic-region PSNR (DPSNR) relative to the strongest prior art. FlowGaussian-VR notably reduces motion blur and produces smooth, coherent optical flows and physically plausible, trackable Gaussian trajectories (Li et al., 31 Jul 2025).
6. Per-Gaussian Trajectory Regularization and Temporal Refinement
A post-processing temporal regularization phase employs an Extended Kalman Filter (EKF) to each Gaussian trajectory. The EKF models each Gaussian’s state as position and velocity and uses rendering-predicted flow as observation, with occlusion handling via depth buffer comparison and compensation for potential drift. EKF application further smooths the trajectories—eliminating zig-zag artifacts and enhancing interpretability—while maintaining nearly optimal PSNR (reduction of at most 0.2 dB, as only spatial attributes are updated). This regularization is particularly valuable for downstream tasks requiring robust, physically meaningful motion traces.
7. Significance and Future Prospects
FlowGaussian-VR delivers the first dynamic-3DGS pipeline simultaneously providing (a) direct motion supervision via velocity field rendering, (b) adaptive, flow-guided densification for dynamic regions, and (c) temporally smooth per-Gaussian trajectory estimation. The resulting system achieves state-of-the-art fidelity and temporal consistency, producing crisp dynamic-scene reconstructions and robust motion estimates suitable for object tracking, AR/VR physics, and further research on dynamic scene understanding.
The explicit supervision of rendered velocity fields and targeted Gaussian augmentation suggest broader applicability to scenarios with challenging, nonrigid motion and may inform future research directions in dynamic representation learning, real-time 3D reconstruction, and motion analysis (Li et al., 31 Jul 2025).