SPARF: Neural Radiance Fields from Sparse and Noisy Poses (2211.11738v3)

Published 21 Nov 2022 in cs.CV

Abstract: Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.

References (65)

Citations (143)

View on Semantic Scholar

Summary

The paper introduces SPARF, a method that jointly optimizes camera poses and neural radiance fields to achieve high-quality view synthesis from sparse, noisy inputs.
It employs innovative multi-view correspondence and depth consistency losses to enforce global geometric accuracy and coherent scene reconstructions.
Experimental results demonstrate that SPARF outperforms current state-of-the-art methods in pose registration and novel view synthesis, even with minimal input images.

Overview of SPARF: Neural Radiance Fields from Sparse and Noisy Poses

The paper introduces Sparse Pose Adjusting Radiance Field (SPARF), a novel method that extends the application of Neural Radiance Fields (NeRF) to scenarios where only a few input views with noisy pose information are available. The method is designed to address the limitations of NeRF in real-world applications where dense input views and highly accurate camera poses are not feasible, such as in AR/VR and autonomous driving scenarios.

Technical Summary

NeRF has showcased significant potential in synthesizing photorealistic views from dense and accurately posed camera images. However, its dependency on high-quality input poses and dense view coverage restricts its practical usability. This paper proposes SPARF to overcome these constraints by introducing a joint optimization strategy for camera pose refinement and scene representation based on sparse input data. The key components and contributions of this research are:

Multi-View Correspondence Loss: Unlike previous NeRF adjustments which rely heavily on individual image alignment and photometric consistency, SPARF introduces a multi-view correspondence objective. This objective leverages pixel matches between input views to ensure a globally consistent geometric solution across all views, guiding both the camera poses and the scene geometry towards accuracy.
Depth Consistency Loss: This objective uses rendered depth maps from initial viewpoints to enforce depth consistency in novel viewpoints. This loss encourages the reconstruction to remain coherent when viewed from unseen perspectives, thus improving rendering quality in novel views.
Joint Pose-NeRF Training: SPARF trains the NeRF model concurrently with camera pose adjustments. A staged training approach is adopted where pose optimization is performed jointly with coarse network training, followed by a phase where refined poses are used to train both coarse and fine networks for high-fidelity scene representation.

Experimental Evaluation

SPARF is evaluated on multiple challenging datasets including DTU, LLFF, and Replica, under the scenario of having as few as three input images. The results demonstrate that SPARF significantly outperforms existing state-of-the-art methods in both pose registration and view synthesis. Notably, SPARF exhibits robustness to initial noise in camera pose estimates, showcasing the critical contribution of the multi-view geometric constraints that work even under wide baselines with sparse views.

Performance in Sparse Views: In conditions where models like BARF and SCNeRF underperform due to insufficient pose registration accuracy in sparse-eye scenarios, SPARF achieves superior registration and synthesis quality.
Comparison with Dense View Approaches: While dense input view-based methods rely on more extensive imagery, SPARF sets new benchmarks in the sparse regime. Even conditional models that generalize from pre-trained datasets such as PixelNeRF, show limited effectiveness in out-of-distribution scenes compared to SPARF’s targeted optimization per scene.

Implications and Future Directions

The implications of this research are substantial for advancing 3D scene representation methodologies. By relaxing the dense view and precise pose prerequisites, SPARF has the potential to democratize high-quality view synthesis in more varied and challenging environment deployments. This can notably enhance applications in robotics and immersive reality technologies.

Future work could involve integrating SPARF into more efficient voxel grid representations to accelerate convergence and experimenting with pose refinement under variable intrinsic camera parameters. Moreover, the development of more sophisticated correspondence techniques or learning-based methods for prioritizing informative matches could further streamline and bolster SPARF’s robustness across various scene complexities.

Ultimately, SPARF’s ability to maintain global geometric consistency and render high-quality novel views with minimal inputs is a critical stride toward practical and scalable 3D scene processing with neural fields.

PDF Markdown

YouTube

Show All Videos