MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo (2103.15595v2)

Published 29 Mar 2021 in cs.CV

Abstract: We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.

Authors (7)

Anpei Chen (28 papers)
Zexiang Xu (56 papers)
Fuqiang Zhao (15 papers)
Xiaoshuai Zhang (23 papers)
Fanbo Xiang (14 papers)
Jingyi Yu (171 papers)
Hao Su (218 papers)

Citations (703)

View on Semantic Scholar

Summary

The paper introduces MVSNeRF, a method that integrates plane-swept cost volumes with multi-view stereo to efficiently reconstruct neural radiance fields from only three input views.
It employs a 3D CNN to process cost volumes and an MLP decoder for differentiable volume rendering, achieving high-quality view synthesis as evidenced by PSNR, SSIM, and LPIPS metrics.
The approach drastically reduces reconstruction time to around 6 minutes compared to traditional NeRF methods requiring over 5 hours, highlighting its practical efficiency and generalizability.

Overview of MVSNeRF: Efficient Radiance Field Reconstruction from Multi-View Stereo

The paper introduces MVSNeRF, a novel approach for reconstructing neural radiance fields (NeRF) that delivers efficient view synthesis. Unlike traditional methods requiring extensive per-scene optimization, MVSNeRF leverages a deep neural network to generalize across various scenes using only three input views. This is achieved through the integration of plane-swept cost volumes, a technique prevalent in multi-view stereo (MVS), and physically based volume rendering.

Methodology

MVSNeRF's framework integrates deep MVS techniques with neural rendering for geometry-aware scene understanding. The core of its approach lies in constructing a plane-swept cost volume by warping features from nearby views, which aids in capturing both scene geometry and appearance. This cost volume is processed via a 3D CNN to predict a neural encoding volume containing per-voxel features representing local geometry and appearance.

The network uses an MLP decoder to compute volume density and radiance, allowing for differentiable volume rendering. The result is a model capable of synthesizing photo-realistic images from novel viewpoints, even in complex scenes significantly different from the training conditions.

Performance

Experimentation demonstrates MVSNeRF's ability to outperform contemporary models in generalizable radiance field reconstruction. Notably, it can produce high-quality view synthesis with only three input images, outperforming concurrent models that rely predominantly on 2D image features, as showcased in their quantitative metrics such as PSNR, SSIM, and LPIPS.

A significant efficiency advantage is demonstrated by the network's ability to achieve comparable or superior rendering quality to traditional NeRF methods with a fraction of the processing time, as short as 6 minutes compared to 5.1 hours for complete NeRF optimizations.

Implications and Future Directions

The implications of MVSNeRF are twofold: it provides a robust solution for efficient radiance field reconstruction using sparse inputs, and it facilitates its use as a strong initial model for further optimization in dense scenarios. This dual capacity highlights its practicality for varying use cases, enhancing the applicability of neural rendering.

The paper paves the way for further exploration in efficient neural scene representations. Optimization strategies for enhancing the encoding volume and potential subdivision techniques could lead to enhanced performance in large or complex scenes. Additionally, developments towards handling more diverse and dynamic scenes could expand MVSNeRF's practical applications.

Overall, MVSNeRF represents an important step towards more efficient, generalizable neural rendering methodologies. This paper provides a well-founded blueprint for subsequent research aimed at bridging the gap between efficiency and quality in view synthesis.

PDF Markdown

Related Papers

YouTube

Show All Videos