NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild (2110.07604v3)

Published 14 Oct 2021 in cs.CV and cs.LG

Abstract: Recent history has seen a tremendous growth of work exploring implicit representations of geometry and radiance, popularized through Neural Radiance Fields (NeRF). Such works are fundamentally based on a (implicit) volumetric representation of occupancy, allowing them to model diverse scene structure including translucent objects and atmospheric obscurants. But because the vast majority of real-world scenes are composed of well-defined surfaces, we introduce a surface analog of such implicit models called Neural Reflectance Surfaces (NeRS). NeRS learns a neural shape representation of a closed surface that is diffeomorphic to a sphere, guaranteeing water-tight reconstructions. Even more importantly, surface parameterizations allow NeRS to learn (neural) bidirectional surface reflectance functions (BRDFs) that factorize view-dependent appearance into environmental illumination, diffuse color (albedo), and specular "shininess." Finally, rather than illustrating our results on synthetic scenes or controlled in-the-lab capture, we assemble a novel dataset of multi-view images from online marketplaces for selling goods. Such "in-the-wild" multi-view image sets pose a number of challenges, including a small number of views with unknown/rough camera estimates. We demonstrate that surface-based neural reconstructions enable learning from such data, outperforming volumetric neural rendering-based reconstructions. We hope that NeRS serves as a first step toward building scalable, high-quality libraries of real-world shape, materials, and illumination. The project page with code and video visualizations can be found at https://jasonyzhang.com/ners.

Citations (141)

View on Semantic Scholar

Summary

The paper introduces a surface-based neural representation that overcomes the limitations of volumetric methods in sparse-view settings.
It employs a neural displacement field over a canonical sphere and BRDF factorization to accurately model geometry and view-dependent appearance.
NeRS demonstrates superior reconstruction performance using metrics like MSE, PSNR, SSIM, and FID, paving the way for practical real-world 3D applications.

Neural Reflectance Surfaces (NeRS) for Sparse-view 3D Reconstruction

The paper "NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild" introduces Neural Reflectance Surfaces (NeRS), aimed at overcoming the limitations of volumetric neural representations like Neural Radiance Fields (NeRF) for 3D reconstruction from sparse views. The authors hypothesize that volumetric methods, while flexible, offer an inappropriate level of expressivity that fails under sparse multi-view image settings, as they require numerous images and precise camera poses to function optimally. NeRS, by contrast, exploits surface-based neural representation to reconstruct 3D surfaces with enhanced efficiency under such constraints.

Core Contributions

Surface Representation and BRDF Factorization:
- NeRS represents geometry as a neural displacement field over a canonical sphere, ensuring the reconstruction is a watertight surface. This is complemented by a neural texture field to model appearance.
- It incorporates a Bi-directional Reflectance Distribution Function (BRDF) to model view-dependent lighting, which separates environmental illumination, diffuse color (albedo), and specular shininess. This design allows for accurate appearance modeling rooted in real-world physics.
Real-world and Sparse-view Dataset:
- NeRS was evaluated using a novel dataset consisting of multi-view images sourced from online marketplaces. This data is realistic, capturing varying object classes under diverse lighting conditions, posing the inherent challenge of sparse views and imprecise cameras.
Training and Optimization Framework:
- The optimization of camera parameters, along with neural shape, texture, and environment maps, is done in a multi-stage process. This approach maximizes the robustness of reconstructions by gradually refining the parameters.
Use of Perceptual Losses and Mask Losses:
- To align reconstructions more closely with real image perceptions, NeRS incorporates perceptual losses and silhouette-based mask losses, mitigating potential inaccuracies in geometry and appearance.

Results and Implications

NeRS outperforms previously established volumetric approaches and competing meta-learning-based methodologies when evaluated on in-the-wild data. Specifically, it offers superior results in terms of mean square error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and Fréchet Inception Distance (FID). This not only demonstrates its capability in reconstructing realistic 3D shapes from limited data but also its potential to be applied in varied real-world applications like virtual object trade and autonomous systems.

Future Directions and Challenges

The research highlights a critical challenge—accurate estimation of camera poses in-the-wild remains an underexplored problem space. Future work can focus on bridging this gap with innovative solutions to further enhance model robustness. There exists a fundamental ambiguity between illumination and material properties, primarily in certain textures and lighting conditions, which could be improved by incorporating more sophisticated reflectance models or learning-based illumination corrections.

The NeRS represents a significant step forward in sparse-view 3D reconstruction. Its potential to extend beyond the limitations of current volumetric methods suggests more comprehensive libraries of real-world geometries, textures, and lighting could be synthesized, driving forward the fields of virtual reality, object modeling, and computer vision.

PDF Markdown

Related Papers

YouTube

Show All Videos