Papers
Topics
Authors
Recent
2000 character limit reached

D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction (2510.08566v1)

Published 9 Oct 2025 in cs.CV

Abstract: Recent advances in 3D Gaussian Splatting (3DGS) enable real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations. However, performance degradation and instability remain significant under sparse-view conditions. In this work, we identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage. To address these challenges, we propose a unified framework D$2$GS, comprising two key components: a Depth-and-Density Guided Dropout strategy that suppresses overfitting by adaptively masking redundant Gaussians based on density and depth, and a Distance-Aware Fidelity Enhancement module that improves reconstruction quality in under-fitted far-field areas through targeted supervision. Moreover, we introduce a new evaluation metric to quantify the stability of learned Gaussian distributions, providing insights into the robustness of the sparse-view 3DGS. Extensive experiments on multiple datasets demonstrate that our method significantly improves both visual quality and robustness under sparse view conditions. The project page can be found at: https://insta360-research-team.github.io/DDGS-website/.

Summary

  • The paper introduces D²GS, a unified framework that applies depth-and-density guided dropout to reduce near-field overfitting and far-field underfitting.
  • It employs a Distance-Aware Fidelity Enhancement module using monocular depth estimation to enforce dense Gaussian coverage in distant image regions.
  • Experimental results demonstrate that D²GS surpasses state-of-the-art methods on LLFF and MipNeRF360, achieving superior PSNR, SSIM, and stability with the new IMR metric.

Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction

Introduction and Motivation

The paper introduces D2^2GS, a unified framework for robust and accurate 3D Gaussian Splatting (3DGS) under sparse-view conditions. While 3DGS has enabled real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations, its performance degrades significantly when only a few input views are available. The authors systematically analyze the failure modes of 3DGS in sparse-view settings, identifying two key issues: overfitting in near-field regions (excessive Gaussian density near the camera) and underfitting in far-field regions (insufficient Gaussian coverage in distant areas). Figure 1

Figure 1: Comparison of Gaussian primitives and rendered images between dense views (55 views) and sparse views (3 views), highlighting overfitting in the near field and underfitting in the far field.

D2^2GS Framework

The D2^2GS framework consists of two principal modules:

  1. Depth-and-Density Guided Dropout (DD-Drop): This module adaptively removes Gaussian primitives based on both local density and camera distance, using a dual local-global mechanism. The dropout score for each Gaussian is computed as a weighted sum of normalized depth and density, and the dropout probability is further modulated by depth-based global layering. This soft, probabilistic dropout avoids the pitfalls of hard selection strategies, which can lead to persistent suppression of specific regions and loss of important details.
  2. Distance-Aware Fidelity Enhancement (DAFE): To address underfitting in far-field regions, DAFE leverages monocular depth estimation to generate binary masks that isolate distant regions in the input images. A dedicated loss term is applied to these regions, amplifying the supervision signal and encouraging the generation of denser Gaussian primitives in the far field. Figure 2

    Figure 2: The D2^2GS framework, showing the DD-Drop module for adaptive dropout and the DAFE module for enhanced far-field supervision.

Inter-Model Robustness Metric

The paper introduces Inter-Model Robustness (IMR), a novel metric for quantifying the stability of learned 3D Gaussian distributions across independently trained models. IMR is computed using the 2-Wasserstein distance and Optimal Transport theory over Gaussian mixture distributions, providing a direct measure of 3D representation robustness beyond traditional image-space metrics like PSNR and SSIM. Figure 3

Figure 3: Left: Instability in previous methods, with significant PSNR fluctuations across training rounds. Right: IMR calculation using Gaussian mixture distributions and 2-Wasserstein Distance.

Experimental Results

Quantitative Evaluation

D2^2GS is evaluated on LLFF and MipNeRF360 datasets, outperforming both NeRF-based and 3DGS-based baselines in all key metrics (PSNR, SSIM, LPIPS, AVGE). For example, on LLFF (3-view, 1/8 resolution), D2^2GS achieves a PSNR of 21.35, SSIM of 0.746, and LPIPS of 0.179, surpassing DropGaussian and other state-of-the-art methods. On MipNeRF360, D2^2GS also leads with a PSNR of 20.09 and SSIM of 0.587.

Robustness Assessment

IMR scores demonstrate that D2^2GS yields more stable and consistent Gaussian reconstructions across independent runs, with the lowest IMR values in both 3-view and 6-view settings.

Qualitative Evaluation

Qualitative comparisons show that D2^2GS produces sharper details and fewer artifacts, especially in high-frequency regions, compared to DropGaussian and CoR-GS. Figure 4

Figure 4: Qualitative comparison on LLFF dataset, showing D2^2GS avoids artifacts and maintains accurate reconstructions.

Figure 5

Figure 5: Additional qualitative results on LLFF dataset, highlighting improved detail preservation.

Figure 6

Figure 6: Qualitative comparison on MipNeRF360 dataset, demonstrating robust reconstruction in challenging scenes.

Ablation Studies

Ablation experiments validate the complementary benefits of each module. Progressive addition of density score, depth score, and depth-based layering in DD-Drop, as well as the DAFE module, leads to steady improvements in all metrics. The best performance is achieved with balanced weights for depth and density (ωdepth=0.5\omega_{depth}=0.5, ωdensity=0.5\omega_{density}=0.5) and a moderate dropout rate schedule.

Further ablations on the DAFE module show that enforcing depth fidelity in the top 5% farthest regions is most beneficial, and the method is robust to the choice of monocular depth estimator.

Implementation Details

The pipeline begins with Structure-from-Motion (SfM) for initial point cloud and camera pose estimation. Gaussian primitives are initialized from the fused point cloud, with attributes such as opacity, scale, and rotation set to default values. Training involves progressive refinement of spherical harmonics, adaptive dropout scheduling, and periodic recalculation of density information. The additional computations for depth and density increase training time modestly but remain tractable.

Discussion and Limitations

The DD-Drop module's soft, probabilistic selection mechanism avoids the drawbacks of hard selection strategies, reconciling the strengths of selective and random dropout. However, the framework relies on hand-crafted depth thresholds and fixed weight coefficients, which may not fully capture complex scene-specific priors. The IMR metric focuses on inter-model consistency but does not address perceptual stability under dynamic view synthesis. Future work could explore adaptive dropout schedules, learnable supervision masks, and temporally-aware robustness metrics.

Implications and Future Directions

D2^2GS advances the state-of-the-art in sparse-view 3DGS by systematically addressing overfitting and underfitting through targeted regularization and supervision. The introduction of IMR provides a principled approach to evaluating 3D representation robustness. These contributions have practical implications for real-world NVS applications where dense multi-view data is unavailable. The framework's modular design and compatibility with various depth priors suggest potential for further integration with self-supervised or multi-modal learning approaches.

Conclusion

D2^2GS presents a principled solution for stable and accurate sparse-view 3D reconstruction, combining depth-and-density guided dropout with distance-aware fidelity enhancement. The framework achieves superior quantitative and qualitative results, with improved robustness across independent runs. The proposed IMR metric offers a new perspective on evaluating 3DGS stability. While limitations remain, D2^2GS establishes a strong foundation for future research in robust, efficient, and generalizable 3D scene reconstruction under sparse-view constraints.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.

Youtube Logo Streamline Icon: https://streamlinehq.com