DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch (1909.05845v1)

Published 12 Sep 2019 in cs.CV

Abstract: Our goal is to significantly speed up the runtime of current state-of-the-art stereo algorithms to enable real-time inference. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities without requiring full cost volume evaluation. We then exploit this representation to learn which range to prune for each pixel. By progressively reducing the search space and effectively propagating such information, we are able to efficiently compute the cost volume for high likelihood hypotheses and achieve savings in both memory and computation. Finally, an image guided refinement module is exploited to further improve the performance. Since all our components are differentiable, the full network can be trained end-to-end. Our experiments show that our method achieves competitive results on KITTI and SceneFlow datasets while running in real-time at 62ms.

Citations (236)

View on Semantic Scholar

Summary

The paper introduces a novel differentiable PatchMatch method that adaptively prunes the disparity search space for real-time stereo matching.
It utilizes a sparse cost volume and image-guided refinement module to achieve competitive accuracy with a runtime of 62ms on key benchmarks.
The end-to-end trainable architecture sets a new precedent for integrating classical vision techniques with deep learning in stereo applications.

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch

This paper presents an innovative approach, titled DeepPruner, designed to enable efficient and real-time stereo matching, which is crucial for applications in areas such as robotics and computational photography. The objective is to enhance the runtime performance of stereo algorithms without substantially compromising the accuracy typically demanded by state-of-the-art methods. The authors introduce a novel differentiation methodology applied to PatchMatch, an influential technique for computing dense correspondences across images. This differentiation allows for the dynamic pruning of disparity candidates, thereby alleviating the computational burden of full cost volume evaluation—a common bottleneck in standard stereo estimation pipelines.

Overview of Methodology

The proposed approach hinges on the following key components:

Differentiable PatchMatch: Traditional PatchMatch is adapted into a differentiable framework, enabling it to seamlessly integrate with neural networks for end-to-end learning. The algorithm iteratively samples disparity candidates and propagates the best hypotheses without exhaustive cost volume evaluation.
Search Space Pruning: Building upon the insights gained from PatchMatch, the model learns to prune the disparity search space adaptively for each pixel. This is informed by the coherent nature of adjacent disparities in images and the ability to propagate such information effectively.
Sparse Cost Volume Construction: With the reduced search space, the model constructs and processes a sparse cost volume. This leads to significant reductions in both memory consumption and computational time, a crucial aspect for achieving real-time performance.
Image-Guided Refinement Module: To handle inaccuracies that may arise from pruning, the authors incorporate a refinement module. Operating in a lightweight manner, this module leverages image features to enhance the quality of disparity maps.
End-to-End Training: The complete architecture is fully differentiable, enabling end-to-end training. This joint learning mechanism optimizes the various components collaboratively, refining their roles across the stereo pipeline.

Experimental Insights

DeepPruner demonstrated compelling performance when benchmarked against several datasets. On the KITTI and SceneFlow datasets, the method achieved competitive accuracy with a runtime of 62ms, establishing it as a viable real-time solution. Specifically, in the SceneFlow dataset, DeepPruner achieved an End-Point Error (EPE) of 0.86, positioning it in the upper echelon of stereo matching methods. On the KITTI 2015 test set, it showed an all-pixel 2.15% outlier rate, thus reaffirming its efficacy.

Implications and Future Prospects

The introduction of differentiable PatchMatch sets a precedent for incorporating classical computer vision techniques within modern deep learning frameworks. The success of this integration encourages further exploration in utilizing similar adaptations across other domains, such as optical flow and scene flow estimation.

The pruning and propagation ideas inherent to DeepPruner could be extended to contexts where real-time image processing is critical. The reduction in computational demands offers practical benefits for deployment in embedded systems and mobile platforms, which have limited processing capabilities.

As future work, there is potential to expand DeepPruner's architecture to accommodate dynamic scenes and adverse environmental conditions, which pose additional challenges in stereo vision.

In conclusion, DeepPruner represents a substantive advancement in stereo matching by marrying classical methods with deep learning, ensuring both accuracy and efficiency. Through strategic cost volume pruning and comprehensive end-to-end optimization, it sets a benchmark for real-time capable models in the field.