SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis (2303.16196v2)

Published 28 Mar 2023 in cs.CV

Abstract: Neural Radiance Field (NeRF) significantly degrades when only a limited number of views are available. To complement the lack of 3D information, depth-based models, such as DSNeRF and MonoSDF, explicitly assume the availability of accurate depth maps of multiple views. They linearly scale the accurate depth maps as supervision to guide the predicted depth of few-shot NeRFs. However, accurate depth maps are difficult and expensive to capture due to wide-range depth distances in the wild. In this work, we present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. The inaccurate depth observations are either from pre-trained depth models or coarse depth maps of consumer-level depth sensors. Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches. To preserve the spatial continuity of the estimated depth of NeRF, we further propose a spatial continuity constraint to encourage the consistency of the expected depth continuity of NeRF with coarse depth maps. Surprisingly, with simple depth ranking constraints, SparseNeRF outperforms all state-of-the-art few-shot NeRF methods (including depth-based models) on standard LLFF and DTU datasets. Moreover, we collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro. Extensive experiments on NVS-RGBD dataset also validate the superiority and generalizability of SparseNeRF. Code and dataset are available at https://sparsenerf.github.io/.

Citations (127)

View on Semantic Scholar

Summary

The paper introduces a novel depth ranking constraint that uses coarse depth maps to guide neural radiance fields under few-shot conditions.
It implements a spatial continuity constraint to maintain coherent geometry even in challenging, low-texture scenarios.
Empirical results on LLFF, DTU, and NVS-RGBD datasets show improved PSNR, SSIM, and LPIPS metrics over conventional few-shot methods.

Overview of SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

SparseNeRF presents a novel approach for enhancing few-shot novel view synthesis through the distillation of depth priors, addressing the performance degradation of Neural Radiance Fields (NeRFs) when limited views are available. The research introduces a framework leveraging depth ranking and spatial continuity constraints informed by inaccurate observations, either from pre-trained models or consumer-level depth sensors. The work pivots away from the dependency on precise depth maps, which are often elusive due to practical constraints, and instead capitalizes on coarse depth information to facilitate accurate depth perception in NeRFs.

Key Contributions

Depth Ranking Constraint: At the core of the SparseNeRF approach is a local depth ranking method that imposes constraints ensuring the expected depth ranking of a NeRF is consistent with that of coarse depth maps. This relative depth supervision circumvents inaccuracies associated with absolute depth values, refining the model's understanding of spatial relationships within a scene.
Spatial Continuity Constraint: The paper introduces a continuity constraint to maintain spatial coherence, aligning the NeRF’s depth continuity with that of the coarse depth maps. This promotes a stable and coherent geometry even in textureless or complex areas.
Empirical Results: SparseNeRF notably surpasses existing few-shot NeRF methods on the LLFF and DTU datasets by harnessing simple depth ranking constraints. Its performance is robust and broadly applicable to real-world conditions as evidenced by extensive testing on the newly introduced NVS-RGBD dataset, which contains RGBD data from low-cost depth sensors.
NVS-RGBD Dataset Contribution: The research contributes the NVS-RGBD dataset, encompassing depth maps from various consumer devices, enabling further exploration and validation of sparse view synthesis techniques.

Numerical Results

SparseNeRF’s methodology leads to measurable improvements in metrics such as PSNR, SSIM, and LPIPS across tested datasets. For example, the paper reports a PSNR gain on both LLFF and DTU datasets when compared to several baselined methods, reinforcing the effectiveness of depth ranking over conventional depth scaling strategies.

Practical and Theoretical Implications

The implications of SparseNeRF extend into both practical applications and theoretical advancements:

Practical Applications: By easing the reliance on dense view training data and expensive, high-accuracy depth maps, SparseNeRF is well-suited for real-world applications in augmented reality and telepresence, where input data is inherently limited.
Theoretical Extensions: The approach invites exploration into the integration of relative depth priors, challenging the typical assumption of requiring global, absolute depth precision. Evolving these techniques could notably influence future NeRF architectures, particularly in challenging environments with sparse data.

Future Directions

The potential of SparseNeRF suggests several directions for future research. One avenue includes extending the framework to leverage additional types of prior information, such as texture or semantic context, alongside depth. Another aspect is improving the granularity of depth ranking interaction to fine-tune geometric understanding in even more challenging scenarios. Additionally, investigating the synergy between SparseNeRF and other few-shot or single-view models could further enhance image synthesis robustness.

In conclusion, SparseNeRF is positioned as a significant contribution to the landscape of view synthesis, providing a compelling methodology for exploiting coarse depth information to achieve state-of-the-art performance in scenarios with limited visual data. The research underscores the viability of relative geometric reasoning in generating coherent scene reconstructions, enabling practical implementations in challenging real-world environments.

PDF Markdown

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis (2303.16196v2)

Summary

Overview of SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Key Contributions

Numerical Results

Practical and Theoretical Implications

Future Directions

GitHub

Tweets

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis (2303.16196v2)

Summary

Overview of SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Key Contributions

Numerical Results

Practical and Theoretical Implications

Future Directions

Related Papers

GitHub

Tweets