1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering (2503.16422v1)

Published 20 Mar 2025 in cs.CV

Abstract: 4D Gaussian Splatting (4DGS) has recently gained considerable attention as a method for reconstructing dynamic scenes. Despite achieving superior quality, 4DGS typically requires substantial storage and suffers from slow rendering speed. In this work, we delve into these issues and identify two key sources of temporal redundancy. (Q1) \textbf{Short-Lifespan Gaussians}: 4DGS uses a large portion of Gaussians with short temporal span to represent scene dynamics, leading to an excessive number of Gaussians. (Q2) \textbf{Inactive Gaussians}: When rendering, only a small subset of Gaussians contributes to each frame. Despite this, all Gaussians are processed during rasterization, resulting in redundant computation overhead. To address these redundancies, we present \textbf{4DGS-1K}, which runs at over 1000 FPS on modern GPUs. For Q1, we introduce the Spatial-Temporal Variation Score, a new pruning criterion that effectively removes short-lifespan Gaussians while encouraging 4DGS to capture scene dynamics using Gaussians with longer temporal spans. For Q2, we store a mask for active Gaussians across consecutive frames, significantly reducing redundant computations in rendering. Compared to vanilla 4DGS, our method achieves a $41\times$ reduction in storage and $9\times$ faster rasterization speed on complex dynamic scenes, while maintaining comparable visual quality. Please see our project page at https://4DGS-1K.github.io.

Summary

The paper introduces a Spatial-Temporal Variation Score to prune short-lifespan Gaussians, significantly reducing storage overhead.
It employs key-frame based temporal filtering to render only active Gaussians, achieving over 1000 FPS on standard datasets.
The method maintains high visual quality comparable to vanilla 4DGS, enabling efficient real-time dynamic scene rendering on resource-constrained hardware.

The paper "1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering" (2503.16422) introduces 4DGS-1K, a method designed to significantly improve the storage efficiency and rendering speed of 4D Gaussian Splatting (4DGS) for dynamic scenes. While vanilla 4DGS (2310.10642) achieves high visual quality for dynamic view synthesis, it suffers from large storage requirements and slow rendering, limiting its practical application.

The authors identify two main sources of temporal redundancy in vanilla 4DGS:

Short-Lifespan Gaussians: A large number of Gaussians have very short temporal durations (small $\Sigma_t$ values) and appear/disappear quickly, particularly around moving objects. This necessitates a high count of Gaussians to model dynamic elements, leading to substantial storage overhead.
Inactive Gaussians: At any given timestamp during rendering, only a small subset of the total Gaussians are visible and contribute to the final image. However, vanilla 4DGS processes all Gaussians during rasterization, resulting in significant redundant computation.

To address these issues and enable high-speed, low-storage dynamic scene rendering, 4DGS-1K proposes a two-step approach:

Pruning Short-Lifespan Gaussians: This step reduces the total number of Gaussians by removing those that have minimal impact over time. The authors introduce a novel Spatial-Temporal Variation Score as the pruning criterion.
- Spatial Score ( $\mathcal{S}^S_i$ ): Measures a Gaussian's contribution to the rendered pixels across training views, similar to standard 3DGS pruning methods.
- Temporal Score ( $\mathcal{S}^T_i$ ): Quantifies a Gaussian's lifespan and stability over time. This is calculated based on the magnitude of the second derivative of the Gaussian's temporal opacity function $p_i(t)$ (large second derivative magnitude indicates unstable/short-lived Gaussians) and the normalized volume of the 4D Gaussian.
- Combined Score ( $\mathcal{S}_i$ ): The final score is an aggregation of the spatial and temporal scores, summed over all training timestamps. Gaussians with low $\mathcal{S}_i$ are pruned. After pruning, the remaining Gaussians are fine-tuned.
Fast Rendering with Temporal Filtering: This step tackles the inactive Gaussian problem during runtime rendering. Based on the observation that active Gaussians tend to overlap significantly between adjacent frames, a key-frame based temporal filtering mechanism is employed.
- Key-Frame Selection: Sparse key-frames are selected at regular intervals (e.g., every 20 frames).
- Visibility Masks: For each key-frame, a mask is computed indicating which Gaussians are active/visible across all training views at that specific timestamp.
- Filtered Rendering: When rendering a view at a query timestamp $t$ , the system identifies the two nearest key-frames, $t_l$ and $t_r$ . Rendering is then performed using only the union of the active Gaussian sets from $t_l$ and $t_r$ . This significantly reduces the number of Gaussians processed by the rasterizer for each frame. Fine-tuning with this filter active helps compensate for any minor details lost due to using masks from sparse key-frames.

Practical Implementation and Results:

The 4DGS-1K method is evaluated on standard dynamic scene datasets like Neural 3D Video (N3V) and D-NeRF. The results demonstrate significant improvements:

Rendering Speed: 4DGS-1K achieves Raster FPS (frames per second considering only the rasterization step) exceeding 1000 FPS on N3V and 2400 FPS on D-NeRF on an RTX 3090, far surpassing the rendering speeds of vanilla 4DGS (118 Raster FPS on N3V, 1232 Raster FPS on D-NeRF). This makes real-time high-resolution dynamic scene rendering feasible.
Storage Reduction: The pruning strategy drastically reduces the number of Gaussians. Compared to vanilla 4DGS, 4DGS-1K achieves a 41x reduction in storage on N3V (from 2085MB to 418MB) and a 40x reduction on D-NeRF (from 278MB to 42MB). The "Ours-PP" variant incorporates post-processing techniques like vector quantization and mask compression for even greater storage savings (e.g., down to 50MB on N3V, 7MB on D-NeRF), maintaining high quality.
Visual Quality: 4DGS-1K maintains visual quality comparable to or slightly better than vanilla 4DGS (e.g., PSNR, SSIM, LPIPS). The pruning strategy effectively removes redundant Gaussians without sacrificing necessary details. The temporal filter, combined with fine-tuning, efficiently leverages temporal coherence. On synthetic datasets like D-NeRF, 4DGS-1K even helps mitigate artifacts like "floaters" sometimes present in vanilla 4DGS due to limited training data.
Resource Consumption: The fine-tuning process after pruning and filtering is relatively fast (approx. 30 mins). More importantly, rendering with 4DGS-1K requires significantly less GPU memory during inference (e.g., 1.62GB on N3V compared to higher amounts for vanilla 4DGS), enabling deployment on lower-performance hardware like a TITAN X while still achieving over 200 FPS.

Implementation Considerations:

The pruning ratio (e.g., 80% for N3V, 85% for D-NeRF) and the key-frame interval ( $\Delta_t$ ) are hyperparameters that need to be tuned. Ablation studies show the trade-offs between compression/speed and rendering quality based on these settings. An improper pruning ratio or large key-frame interval without adequate fine-tuning can degrade quality.
The spatial-temporal variation score requires computing sums over training views and time, adding computation during the pruning phase.
The temporal filter requires pre-computing and storing visibility masks for key-frames. While the mask storage is shown to be minimal (approx. 1MB per scene), this is an additional artifact alongside the compressed Gaussian parameters.
The Raster FPS metric highlights the acceleration specifically from reducing the number of Gaussians processed by the rasterizer. The overall FPS also includes preliminary preparation steps, which become a larger proportion of the total time with the faster rasterization, suggesting potential future work on optimizing these other parts of the rendering pipeline.

In summary, 4DGS-1K provides a practical and highly efficient solution for rendering dynamic scenes using Gaussian Splatting, addressing the primary bottlenecks of storage and speed in prior work through intelligent pruning and temporal filtering strategies. This advancement makes dynamic scene representation with Gaussian Splatting significantly more viable for real-time applications and resource-constrained environments.

PDF Markdown

Related Papers

Find Related Papers

GitHub

Tweets

https://twitter.com/zhenjun_zhao/status/1902935535565316530

https://twitter.com/_akhaliq/status/1903159327390658842

https://twitter.com/Ryansikorski10/status/1931850213947322689