Depth-aware Volume Attention for Texture-less Stereo Matching (2402.08931v2)
Abstract: Stereo matching plays a crucial role in 3D perception and scenario understanding. Despite the proliferation of promising methods, addressing texture-less and texture-repetitive conditions remains challenging due to the insufficient availability of rich geometric and semantic information. In this paper, we propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios. Specifically, we introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture. Subsequently, the disparity discrepancy volume undergoes hierarchical filtering through the incorporation of depth-aware hierarchy attention and target-aware disparity attention modules. Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation. Furthermore, we propose a more rigorous evaluation metric that considers depth-wise relative error, providing comprehensive evaluations for universal stereo matching and depth estimation models. We extensively validate the superiority of our proposed methods on public datasets. Results demonstrate that our model achieves state-of-the-art performance, particularly excelling in scenarios with texture-less images. The code is available at https://github.com/ztsrxh/DVANet.
- Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5410–5418, 2018.
- Cost affinity learning network for stereo matching. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2120–2124. IEEE, 2021.
- Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems, 33, 2020.
- Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4383–4392, 2019.
- Adabins: Depth estimation using adaptive bins. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4008–4017, 2021.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.
- Unsupervised monocular depth estimation with left-right consistency. In CVPR, 2017.
- Context-enhanced stereo transformer. In European Conference on Computer Vision, pages 263–279. Springer, 2022.
- Group-wise correlation stereo network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3273–3282, 2019.
- End-to-end learning of geometry and context for deep stereo regression. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 66–75, 2017a.
- End-to-end learning of geometry and context for deep stereo regression. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 66–75, 2017b.
- A survey on deep learning techniques for stereo-based depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):1738–1764, 2022.
- From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326, 2019.
- Area-based correlation and non-local attention network for stereo matching. The Visual Computer, 38(11):3881–3895, 2022.
- Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6197–6206, 2021.
- Learning for disparity estimation through feature constancy. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2811–2820, 2018.
- A survey of transformers. AI Open, 2022.
- Raft-stereo: Multilevel recurrent field transforms for stereo matching. In International Conference on 3D Vision (3DV), 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Elfnet: Evidential local-global fusion for stereo matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17784–17793, 2023.
- Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11), 2018.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4040–4048, 2016.
- Object scene flow for autonomous vehicles. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- idisc: Internal discretization for monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- On the confidence of stereo matching in a deep-learning era: A quantitative evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5293–5313, 2022.
- A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), pages 131–140, 2001.
- Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13906–13915, 2021.
- Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Transactions on Circuits and Systems for Video Technology, 31(11):4381–4393, 2021.
- Edgestereo: A context integrated residual pyramid network for stereo matching. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part V 14, pages 20–35. Springer, 2019.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14362–14372, 2021.
- A confidence-aware cascade network for multi-scale stereo matching of very-high-resolution remote sensing images. Remote Sensing, 14:1667, 2022.
- Real-time self-adaptive deep stereo. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 195–204, 2019.
- Attention concatenation volume for accurate and efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12981–12990, 2022.
- Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21919–21928, 2023.
- Aanet: Adaptive aggregation network for efficient stereo matching. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1956–1965, 2020.
- Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 185–194, 2019a.
- Adaptive unimodal cost volume filtering for deep stereo matching. arXiv preprint arXiv:1909.03751, 2019b.
- High-frequency stereo matching network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1327–1336, 2023a.
- Rsrd: A road surface reconstruction dataset and benchmark for safe and comfortable autonomous driving. arXiv preprint arXiv:2310.02262, 2023b.