RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering (2203.03949v4)

Published 8 Mar 2022 in cs.CV

Abstract: Finding accurate correspondences among different views is the Achilles' heel of unsupervised Multi-View Stereo (MVS). Existing methods are built upon the assumption that corresponding pixels share similar photometric features. However, multi-view images in real scenarios observe non-Lambertian surfaces and experience occlusions. In this work, we propose a novel approach with neural rendering (RC-MVSNet) to solve such ambiguity issues of correspondences among views. Specifically, we impose a depth rendering consistency loss to constrain the geometry features close to the object surface to alleviate occlusions. Concurrently, we introduce a reference view synthesis loss to generate consistent supervision, even for non-Lambertian surfaces. Extensive experiments on DTU and Tanks&Temples benchmarks demonstrate that our RC-MVSNet approach achieves state-of-the-art performance over unsupervised MVS frameworks and competitive performance to many supervised methods.The code is released at https://github.com/Boese0601/RC-MVSNet

Citations (43)

View on Semantic Scholar

Summary

The paper introduces a novel rendering consistency framework that resolves correspondence ambiguities in unsupervised MVS under non-Lambertian and occluded conditions.
It employs two innovative loss functions—Depth Rendering Consistency and Reference View Synthesis—to enhance geometric precision and view-dependent supervision.
RC-MVSNet achieves state-of-the-art results on DTU and Tanks&Temples, demonstrating robust depth prediction without relying on ground truth data.

An Expert Review of RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering

The paper "RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering" addresses the critical challenge of finding accurate correspondences in unsupervised Multi-View Stereo (MVS) by leveraging neural rendering techniques. The authors propose an innovative approach, RC-MVSNet, which introduces a novel rendering consistency framework to resolve inherent ambiguities in correspondence, particularly those exacerbated by non-Lambertian surfaces and occlusions.

Key Methodological Contributions

The method stands on the premise of overcoming the limitations of existing unsupervised MVS approaches, which typically rely on the photometric consistency assumption. This traditional assumption often fails in real-world scenarios where surfaces reflect light in ways that vary with the viewpoint and where occlusions create additional challenges. RC-MVSNet extends beyond this assumption by incorporating two novel loss functions:

Depth Rendering Consistency Loss: This loss constrains the geometric features close to the object surface, thereby reducing the adverse impact of occlusions.
Reference View Synthesis Loss: Utilizing neural volumetric rendering, this loss enables the generation of reference images that account for view-dependent effects, providing consistent supervision even for non-Lambertian surfaces.

These components are integrated into an end-to-end differentiable network capable of unsupervised learning, advancing the state-of-the-art by producing highly accurate depth predictions without requiring ground truth data.

Results and Performance Evaluation

The efficacy of RC-MVSNet is demonstrated on challenging benchmarks such as DTU and Tanks{content}Temples. The method achieves remarkable results, outperforming existing unsupervised MVS frameworks and even several supervised techniques. Specifically, the paper reports an improved accuracy, completeness, and overall score in DTU point cloud evaluations, showing robustness against occlusions and non-Lambertian effects.

The use of a Gaussian-Uniform mixture sampling strategy is another critical innovation that allows for more efficient learning of geometric features by focusing on sampling points near the object surfaces, thus enhancing depth estimation accuracy.

Theoretical and Practical Implications

RC-MVSNet's approach to solving MVS challenges via neural rendering has profound implications for 3D computer vision. The introduction of rendering-based supervision can significantly reduce the reliance on annotated datasets, making the technology more accessible for diverse and large-scale real-world applications.

Theoretically, this approach suggests a shift in how MVS problems can be tackled, moving away from rigid assumptions towards more flexible, learning-driven frameworks capable of adapting to complex visual phenomena.

Future Directions

Building upon this framework, future research might explore the integration of other advanced neural representations and reinforcement learning techniques to further enhance depth prediction accuracy and robustness. Furthermore, examining the scalability of this approach in large and complex outdoor scenarios or under variable lighting conditions could extend its applicability.

In conclusion, RC-MVSNet is a significant contribution to the MVS domain, offering a compelling direction towards unsupervised learning methods that incorporate neural rendering for accurate and flexible 3D reconstruction. This methodology not only advances current capabilities but also lays the groundwork for exploring new paradigms in multi-view stereo and neural rendering intersections.

PDF Markdown

Related Papers

GitHub

GitHub - Boese0601/RC-MVSNet: [ECCV 2022] RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering (204 stars)