Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching (1912.06378v3)

Published 13 Dec 2019 in cs.CV

Abstract: The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity. These methods are limited when high-resolution outputs are needed since the memory and time costs grow cubically as the volume resolution increases. In this paper, we propose a both memory and time efficient cost volume formulation that is complementary to existing multi-view stereo and stereo matching approaches based on 3D cost volumes. First, the proposed cost volume is built upon a standard feature pyramid encoding geometry and context at gradually finer scales. Then, we can narrow the depth (or disparity) range of each stage by the depth (or disparity) map from the previous stage. With gradually higher cost volume resolution and adaptive adjustment of depth (or disparity) intervals, the output is recovered in a coarser to fine manner. We apply the cascade cost volume to the representative MVS-Net, and obtain a 23.1% improvement on DTU benchmark (1st place), with 50.6% and 74.2% reduction in GPU memory and run-time. It is also the state-of-the-art learning-based method on Tanks and Temples benchmark. The statistics of accuracy, run-time and GPU memory on other representative stereo CNNs also validate the effectiveness of our proposed method.

Citations (654)

View on Semantic Scholar

Summary

The paper introduces a cascade cost volume method that refines depth predictions through a coarse-to-fine, multi-stage approach.
It leverages a feature pyramid and adaptive sampling to significantly reduce GPU memory usage and runtime.
Experimental results demonstrate marked improvements on DTU, Tanks and Temples, and GwcNet benchmarks in both accuracy and efficiency.

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

The paper addresses the computational and memory limitations inherent in high-resolution multi-view stereo (MVS) and stereo matching methods that rely on 3D cost volumes to derive depth or disparity. Traditional approaches show substantial resource consumption as output resolution increases. The proposed solution, a cascade cost volume formulation, optimizes resource utilization without compromising accuracy.

Methodology

The proposed method builds upon a feature pyramid architecture, leveraging a coarse-to-fine strategy across multiple stages. Each stage sequentially refines the depth hypothesis range informed by predictions from preceding stages. This hierarchical strategy saves computational resources by adjusting the cost volume resolution and depth intervals progressively, allowing fine-scale detail recovery without needing the extensive depth planes typically required by standard methods.

Key features include:

Cascade Architecture: Utilizes a multi-stage process where each stage progressively narrows the hypothesis range and reduces hypothesis plane intervals.
Feature Pyramid: Constructs cost volumes using enhanced spatial resolution feature maps, ensuring accuracy is not sacrificed for efficiency.
Adaptive Sampling: Adjusts depth intervals based on previous predictions, ensuring computational resources focus on meaningful regions.

Experimental Results

Applied to MVSNet, the cascade formulation achieved a 35.6% improvement on the DTU benchmark, with 50.6% less GPU memory and a 59.3% reduction in run-time. For stereo matching, the method reduced end-point errors by approximately 15.2% and lowered GPU consumption by 36.9% on GwcNet.

The method topped the Tanks and Temples benchmark for MVS, underscoring its effectiveness across different stereo tasks and datasets.

Implications and Future Directions

The cascade cost volume offers a significant leap in efficiency for high-resolution depth estimation tasks. This advancement not only enhances real-time applicability in resource-constrained environments but also broadens access to high-resolution outputs in MVS and stereo matching. Future exploration could include adaptive hypothesis range setting tailored for distinct regions, further integrating semantic cues to enhance depth plane selection. This could make the cascade approach even more resource-efficient and potentially applicable to broader applications, such as autonomous vehicles or augmented reality systems.

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching (1912.06378v3)

Summary

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

Methodology

Experimental Results

Implications and Future Directions

Follow-up Questions

Authors (6)

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching (1912.06378v3)

Summary

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

Methodology

Experimental Results

Implications and Future Directions

Follow-up Questions

Related Papers

Authors (6)