Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

134 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

LocalDyGS: Dynamic Scene Reconstruction

Updated 6 July 2025

LocalDyGS is a dynamic 3D reconstruction framework that partitions complex scenes into localized spaces to capture both fine-scale and large-scale motions.
It decouples static and dynamic features within each local space via adaptive fusion and temporal Gaussian parameterization to enhance efficiency and accuracy.
Adaptive seed growing and multi-view SfM integration enable robust 3D modeling for applications in AR/VR, gaming, and dynamic scene analysis.

LocalDyGS is a framework for dynamic scene reconstruction from multi-view video, designed to accurately and efficiently model both fine-scale and large-scale motions in highly dynamic real-world scenes. The central contribution is an adaptive decomposition of the global scene into multiple local spaces, each represented by decoupled static and dynamic features, which are fused to generate time-varying Temporal Gaussians as rendering primitives. This approach makes it possible to reconstruct complex, temporally evolving motion for arbitrary viewpoints, overcoming key limitations of prior neural radiance field and 3D Gaussian splatting methods, especially in cases involving large-scale dynamic scenes (2507.02363).

1. Local Space Decomposition

LocalDyGS partitions the entire dynamic scene into a collection of local spaces, each defined by a seed point. Seed initialization is performed by fusing a Structure-from-Motion (SfM) point cloud collected from multiple frames, so that seeds are distributed in regions where dynamic objects are present. Each seed marks the center of a local space covering a spatial neighborhood, whose size is adjustable by a learned scale parameter $v$ . This decomposition reduces the global dynamic modeling problem into multiple localized subproblems, where both motion boundaries and fine details can be better captured within each local context. The number and placement of seeds directly control the framework’s capacity to represent complex and multi-scale dynamics across the scene.

2. Decoupling Static and Dynamic Features

A distinctive aspect of the LocalDyGS framework is the explicit separation (decoupling) of static and dynamic representations within each local space. For any seed:

A static feature $f_s$ is learned and shared across all time steps, encoding the time-invariant geometric and radiance properties of the scene near the seed.
A dynamic residual feature $f_d$ is provided by a global four-dimensional hash-encoded residual field (space-time), which captures time-specific changes at that seed.

The two feature streams are fused at each sampling time via an adaptively weighted sum, where the weights $w_s$ and $w_d$ are predicted by a shallow MLP $F_w$ conditioned on the seed position and query time:

$f_w = w_s \cdot f_s + w_d \cdot f_d$

This fusion allows the dynamic component to selectively focus only on changes, greatly reducing redundancy when large portions of the scene remain static over time.

3. Temporal Gaussian Parameterization

Within each local space, motion is modeled using a set of $k$ Temporal Gaussians whose parameters are predicted from the fused feature $f_w$ . For each Gaussian, essential rendering parameters are produced via shallow MLPs:

Mean: $\{\mu^t_i\}_{i=0}^{k-1} = \mu + v \cdot F_\mu(f_w)$ ( $\mu$ is the seed position, $F_\mu$ is a small MLP)
Opacity: $\sigma^t = \mathrm{Sigmoid}(F_o(f_w, d))$ , with $d$ indicating direction
Scale, rotation, and color: produced analogously via dedicated predictors

Temporal Gaussians are activated only during time intervals corresponding to local motion. If the predicted opacity $\sigma^t$ falls below a threshold $\tau_\alpha$ , the corresponding Gaussian is automatically pruned—improving computational efficiency while maintaining fidelity. Unlike methods that seek to optimize continuous 4D trajectories, LocalDyGS does not attempt to reconstruct long-term trajectories for every point, but instead dynamically generates local Gaussians for each time step, better handling complicated, rapidly varying motion.

4. Adaptive Seed Growing

To ensure spatial completeness, LocalDyGS incorporates an Adaptive Seed Growing (ASG) mechanism. During optimization, additional seeds are injected in regions where the 2D projection gradients of the current reconstruction (i.e., reprojection error) exceed a threshold $\tau_g$ . Newly added seeds supplement the initial SfM-based cloud, improving coverage especially in under-represented or occluded regions, and enabling the system to adaptively refine itself during training.

5. Implementation and Training Pipeline

The LocalDyGS system processes synchronized multi-view video frames as follows:

Seed Initialization: Aggregate an SfM point cloud from $N$ frames to place initial seeds.
Local Feature Learning: For each seed, learn a static feature vector and initialize a spatially localized 4D hash field for dynamic residuals.
Gaussian Parameter Prediction: For each local space and each time, compute $f_w$ and decode the parameters for $k$ Temporal Gaussians.
Rendering: Project all active Gaussians to each camera view at every time step and aggregate their contributions for image synthesis.
Optimization: Jointly train all learnable modules (static features, hash fields, weight field MLP, Temporal Gaussian decoders) with photometric loss between rendered and observed images, along with regularization terms for sparsity (opacity thresholding).
Seed Growing: Monitor the 2D projection error and add new seeds where needed during training.

This modular pipeline enables scalable, efficient, and adaptive modeling. Training and inference can be accelerated due to the natural local-to-global parallelization structure and pruning of inactive Gaussians.

6. Empirical Performance and Comparison

LocalDyGS demonstrates state-of-the-art reconstruction quality on both fine-scale motion benchmarks (N3DV, MeetRoom) and large-scale dynamic scenes (e.g., a basketball court dataset, VRU). Metrics such as PSNR, perceptual distance (LPIPS/DSSIM), frames-per-second (FPS), and storage usage show that LocalDyGS achieves:

Sharper and more temporally consistent novel view synthesis compared to prior NeRF or 3D Gaussian-based methods.
Higher efficiency, with total network capacity (e.g., ~100 MB) lower than many competing approaches.
Reduced training time and faster convergence.

In large-scale scenes with highly nonrigid motion, previous methods that rely on global trajectory optimization or dense radiance fields often fail or require extreme computational resources, while LocalDyGS remains efficient due to its local space decomposition and temporal Gaussian mechanism. On static or slowly-varying backgrounds, the method efficiently allocates resources by allowing the static feature stream and pruning inactive Gaussians.

7. Applications and Future Directions

LocalDyGS is well-suited for applications requiring accurate, temporally resolved, and efficient 3D scene reconstruction:

Free-viewpoint video for immersive events or AR/VR, particularly where dynamic actors or large-scale motion present challenges for conventional radiance field or splatting approaches.
Real-time or near-real-time dynamic scene capture for gaming, visual effects, or robotics, where parallel inference and efficient representation are required.
Scenarios where only multi-view synchronized video is available and no dense 3D measurements can be obtained.

Possible future work includes developing new geometric priors or initializing seeds and local spaces in more challenging settings (e.g., from monocular videos or with imperfect SfM results), extending the approach to real-time online optimization, or further compressing the temporal Gaussian representation for storage and latency-sensitive deployments.

In summary, LocalDyGS advances the modeling of dynamic 3D scenes by combining local decomposition, feature decoupling, and time-dependent adaptive rendering primitives, resulting in both high accuracy and scalable computational performance for a wide range of dynamic reconstruction tasks (2507.02363).

PDF Markdown Chat (Upgrade)

References (1)

LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling (2025)