Digging Into Normal Incorporated Stereo Matching (2402.18171v1)
Abstract: Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, disparity estimation in low-texture, occluded, and bordered regions still remains a bottleneck that limits the performance. To tackle these challenges, geometric guidance like plane information is necessary as it provides intuitive guidance about disparity consistency and affinity similarity. In this paper, we propose a normal incorporated joint learning framework consisting of two specific modules named non-local disparity propagation(NDP) and affinity-aware residual learning(ARL). The estimated normal map is first utilized for calculating a non-local affinity matrix and a non-local offset to perform spatial propagation at the disparity level. To enhance geometric consistency, especially in low-texture regions, the estimated normal map is then leveraged to calculate a local affinity matrix, providing the residual learning with information about where the correction should refer and thus improving the residual learning efficiency. Extensive experiments on several public datasets including Scene Flow, KITTI 2015, and Middlebury 2014 validate the effectiveness of our proposed method. By the time we finished this work, our approach ranked 1st for stereo matching across foreground pixels on the KITTI 2015 dataset and 3rd on the Scene Flow dataset among all the published works.
- Bi3D: Stereo Depth Estimation via Binary Classifications. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1597–1605.
- J. Chang and Y. Chen. 2018. Pyramid Stereo Matching Network. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5410–5418.
- Hierarchical Neural Architecture Search for Deep Stereo Matching. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 22158–22169. https://proceedings.neurips.cc/paper/2020/file/fc146be0b230d7e0a92e66a6114b840d-Paper.pdf
- FlowNet: Learning Optical Flow with Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 2758–2766.
- Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2492–2501. https://doi.org/10.1109/CVPR42600.2020.00257
- Group-Wise Correlation Stereo Network. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3268–3277.
- End-to-End Learning of Geometry and Context for Deep Stereo Regression. In 2017 IEEE International Conference on Computer Vision (ICCV). 66–75.
- StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. In Proceedings of the European Conference on Computer Vision (ECCV).
- Normal Assisted Stereo Depth Estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2186–2196. https://doi.org/10.1109/CVPR42600.2020.00226
- Learning for Disparity Estimation Through Feature Constancy. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2811–2820.
- Adaptive Surface Normal Constraint for Depth Estimation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 12829–12838. https://doi.org/10.1109/ICCV48922.2021.01261
- A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4040–4048.
- Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3061–3070. https://doi.org/10.1109/CVPR.2015.7298925
- Christoph Rhemann Michael Bleyer and Carsten Rother. 2011. PatchMatch Stereo - Stereo Matching with Slanted Support Windows. In Proceedings of the British Machine Vision Conference. 14.1–14.11. http://dx.doi.org/10.5244/C.25.14.
- Non-local Spatial Propagation Network for Depth Completion. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 120–136.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8026–8037. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
- GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 283–291. https://doi.org/10.1109/CVPR.2018.00037
- Joint Graph-Based Depth Refinement and Normal Estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12151–12160. https://doi.org/10.1109/CVPR42600.2020.01217
- High-resolution stereo datasets with subpixel-accurate ground truth.. In German Conference on Pattern Recognition (GCPR).
- A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).
- CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13901–13910. https://doi.org/10.1109/CVPR46437.2021.01369
- HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14357–14367. https://doi.org/10.1109/CVPR46437.2021.01413
- PatchmatchNet: Learned Multi-View Patchmatch Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14194–14203.
- FADNet: A Fast and Accurate Network for Disparity Estimation. In 2020 IEEE International Conference on Robotics and Automation (ICRA 2020). 101–107.
- CSPN: Multi-Scale Cascade Spatial Pyramid Network for Object Detection. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1490–1494. https://doi.org/10.1109/ICASSP39728.2021.9414883
- Designing deep networks for surface normal estimation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 539–547. https://doi.org/10.1109/CVPR.2015.7298652
- ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching. arXiv e-prints, Article arXiv:2203.02146 (March 2022), arXiv:2203.02146 pages. arXiv:2203.02146 [cs.CV]
- H. Xu and J. Zhang. 2020. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1956–1965.
- Deformable Spatial Propagation Networks For Depth Completion. In 2020 IEEE International Conference on Image Processing (ICIP). 913–917. https://doi.org/10.1109/ICIP40778.2020.9191138
- SegStereo: Exploiting Semantic Information for Disparity Estimation. In Proceedings of the European Conference on Computer Vision (ECCV).
- Enforcing Geometric Constraints of Virtual Normal for Depth Prediction. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 5683–5692. https://doi.org/10.1109/ICCV.2019.00578
- Hierarchical Discrete Distribution Decomposition for Match Density Estimation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6037–6046.
- GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5429–5438. https://doi.org/10.1109/CVPR46437.2021.00539
- Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4101–4110. https://doi.org/10.1109/CVPR.2019.00423
- A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 6148–6157. https://doi.org/10.1109/ICCV48922.2021.00611