PatchmatchNet: Learned Multi-View Patchmatch Stereo (2012.01411v1)

Published 2 Dec 2020 in cs.CV

Abstract: We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo. With high computation speed and low memory requirement, PatchmatchNet can process higher resolution imagery and is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization. For the first time we introduce an iterative multi-scale Patchmatch in an end-to-end trainable architecture and improve the Patchmatch core algorithm with a novel and learned adaptive propagation and evaluation scheme for each iteration. Extensive experiments show a very competitive performance and generalization for our method on DTU, Tanks & Temples and ETH3D, but at a significantly higher efficiency than all existing top-performing models: at least two and a half times faster than state-of-the-art methods with twice less memory usage.

Citations (289)

View on Semantic Scholar

Summary

The paper introduces PatchmatchNet, a novel end-to-end trainable multi-view stereo method that integrates a cascaded Patchmatch approach with deep learning to achieve high efficiency without expensive 3D cost volumes.
PatchmatchNet incorporates learned adaptive propagation and evaluation mechanisms using learnable offsets and feature similarity for efficient depth hypothesis sampling and improved robustness and generalization.
Experiments show PatchmatchNet achieves competitive performance on standard datasets like DTU and Tanks and Temples while being significantly more efficient, making it suitable for resource-constrained platforms.

Analysis of "PatchmatchNet: Learned Multi-View Patchmatch Stereo"

This work introduces PatchmatchNet, a novel approach to addressing multi-view stereo (MVS) challenges. The authors leverage a cascade formulation of the Patchmatch algorithm within a deep learning context, aiming to achieve high efficiency in terms of computational speed and memory consumption. The essence of the method lies in its end-to-end trainable architecture, which incorporates learning-based adaptive propagation and evaluation mechanisms, breathed into an iterative multi-scale framework. This design exploits the spatial coherence of depth maps while eschewing the costly process of constructing and regularizing 3D cost volumes that has been a hallmark of previous top-performing MVS models.

Contribution and Methodology

The primary contribution of this work is the integration of the Patchmatch algorithm into a cascaded and trainable framework. The adaptive propagation and evaluation steps are key innovations in this approach. The proposed method samples depth hypotheses efficiently, using spatial information adaptively learned during the training process. This allows the system to process high-resolution imagery efficiently, fitting within the constraints of devices with limited computational resources.

Key technical contributions include:

End-to-End Trainability: The integration of Patchmatch into a trainable architecture, leveraging the potentials of deep learning to improve upon traditional MVS methods.
Adaptive Propagation and Evaluation: These modules utilize learnable offsets for depth hypothesis propagation and feature-based similarity computations to enhance depth estimation accuracy and efficiency.
Robustness and Generalization: The architecture is trained with a robust strategy involving randomization and view selection, which enhances the model's ability to generalize to new datasets without fine-tuning.

Results

The experimental evaluation reflects the system's strengths. With impressive results on datasets like DTU, Tanks and Temples, and ETH3D, PatchmatchNet holds its ground favorably against the current state of the art while achieving a significantly higher level of efficiency.

The results on the DTU benchmark confirm competitive performance, with PatchmatchNet achieving an overall quality figure of 0.352 mm, which is on par with or improves upon many advanced methods, alongside substantial reductions in memory and runtime.
The generalization capability of PatchmatchNet is demonstrated on the Tanks and Temples dataset, where it remains competitive without domain-specific fine-tuning, underscoring its adaptability across different conditions.

Implications

PatchmatchNet's contributions have significant implications for real-time and deployment-sensitive applications in computer vision. Specifically, the system demonstrates that competitive MVS performance can be attained with less reliance on complex resource-heavy 3D CNN architectures, thus facilitating deployment on resource-constrained platforms like mobile devices and augmented reality headsets.

Future Directions

Future work may focus on refining the adaptive mechanisms further, perhaps extending them to choose when and where along a sequence adaptive techniques should be applied for maximum efficiency. The robustness of the method against unruly inputs, such as highly dynamic scenes or scenes with extreme scale variations, is another promising line of inquiry. Moreover, ongoing developments in hardware acceleration could assist in exploiting the efficiency gains here even more fully, paving the way for quicker and more versatile solutions in real-life MVS applications.

In summary, PatchmatchNet is a highly efficient and adaptable MVS method, demonstrating how traditional techniques can be revitalized and optimized within an intelligent deep learning framework. The combination of classical vision algorithms with contemporary deep learning approaches as shown here is likely to inspire further innovations in the computational efficiency of vision systems.