Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume Excitation (2108.05773v1)

Published 12 Aug 2021 in cs.CV, cs.AI, and cs.LG

Abstract: Volumetric deep learning approach towards stereo matching aggregates a cost volume computed from input left and right images using 3D convolutions. Recent works showed that utilization of extracted image features and a spatially varying cost volume aggregation complements 3D convolutions. However, existing methods with spatially varying operations are complex, cost considerable computation time, and cause memory consumption to increase. In this work, we construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably. Moreover, we propose a novel method of using top-k selection prior to soft-argmin disparity regression for computing the final disparity estimate. Combining our novel contributions, we present an end-to-end network that we call Correlate-and-Excite (CoEx). Extensive experiments of our model on the SceneFlow, KITTI 2012, and KITTI 2015 datasets demonstrate the effectiveness and efficiency of our model and show that our model outperforms other speed-based algorithms while also being competitive to other state-of-the-art algorithms. Codes will be made available at https://github.com/antabangun/coex.

Citations (56)

View on Semantic Scholar

Summary

The paper introduces Guided Cost Volume Excitation (GCE), a lightweight module that enhances cost aggregation for improved stereo matching accuracy.
It employs top-k disparity regression to mitigate soft-argmin limitations and performs robustly on benchmarks like SceneFlow and KITTI.
The CoEx network achieves real-time processing with competitive performance, making it promising for autonomous driving and robotics.

Overview of "Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume Excitation"

This paper introduces a novel approach to improve stereo matching, a crucial task in depth estimation from pair of images. The proposed method leverages a technique called Guided Cost Volume Excitation (GCE) within a framework named Correlate-and-Excite (CoEx). This research is particularly focused on balancing the trade-off between speed and accuracy, which is a common challenge in stereo matching, especially for real-time applications like autonomous driving.

Key Contributions

Guided Cost Volume Excitation (GCE): The paper proposes GCE, a lightweight module that guides the cost aggregation process in stereo matching. By utilizing extracted image features to excite the cost volume, this method improves the model's performance significantly over traditional spatially varying operations which are more complex and resource-intensive.
Top- $k$ Disparity Regression: The authors introduce a novel method for computing disparity estimates by using top- $k$ selections prior to the soft-argmin operation. This approach addresses the limitation of traditional soft-argmin regression, which may not perform optimally under ambiguous conditions where the disparity distribution isn't unimodal.
Real-Time Capabilities: The CoEx network integrates the proposed modules in an end-to-end stereo matching network that's efficient enough for real-time processing, outperforming other speed-focused methods while remaining competitive with state-of-the-art alternatives.

Experimental Results

The CoEx model demonstrates robust performance across multiple datasets, including SceneFlow, KITTI 2012, and KITTI 2015. It shows significant improvement in reducing endpoint error (EPE) and D1 error percentages, metrics standard in stereo matching evaluations. The authors further illustrate that CoEx operates at an impressive runtime on modern hardware, substantiating its real-time application viability.

Implications and Future Directions

The proposed CoEx model shows promise for various applications that require efficient stereo matching, such as robotics and autonomous vehicles, where both speed and accuracy are critical. By improving computational efficiency without sacrificing accuracy, such advancements could lead to reduced dependency on expensive depth sensors like LiDAR.

While this paper makes significant strides, future research could explore integrating CoEx with other sensory inputs or refining its components for even faster execution. Additionally, expanding the top- $k$ disparity regression method to other computer vision tasks that involve uncertainty or ambiguity in feature space holds potential.

Conclusion

In summary, the research presented in this paper offers a meaningful contribution to the field of stereo matching by introducing GCE and top- $k$ disparity regression. These innovations not only enhance accuracy and speed but also contribute a scalable solution conducive to real-world autonomous systems.

PDF Markdown

Related Papers

GitHub

GitHub - antabangun/coex (145 stars)