Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching (2403.00486v1)

Published 1 Mar 2024 in cs.CV

Abstract: Stereo matching methods based on iterative optimization, like RAFT-Stereo and IGEV-Stereo, have evolved into a cornerstone in the field of stereo matching. However, these methods struggle to simultaneously capture high-frequency information in edges and low-frequency information in smooth regions due to the fixed receptive field. As a result, they tend to lose details, blur edges, and produce false matches in textureless areas. In this paper, we propose Selective Recurrent Unit (SRU), a novel iterative update operator for stereo matching. The SRU module can adaptively fuse hidden disparity information at multiple frequencies for edge and smooth regions. To perform adaptive fusion, we introduce a new Contextual Spatial Attention (CSA) module to generate attention maps as fusion weights. The SRU empowers the network to aggregate hidden disparity information across multiple frequencies, mitigating the risk of vital hidden disparity information loss during iterative processes. To verify SRU's universality, we apply it to representative iterative stereo matching methods, collectively referred to as Selective-Stereo. Our Selective-Stereo ranks $1{st}$ on KITTI 2012, KITTI 2015, ETH3D, and Middlebury leaderboards among all published methods. Code is available at https://github.com/Windsrain/Selective-Stereo.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Computing receptive fields of convolutional neural networks. Distill, 4(11):e21, 2019.
  2. Instereo2k: a large real dataset for stereo matching in indoor scenes. Science China Information Sciences, 63:1–11, 2020.
  3. A naturalistic open source movie for optical flow evaluation. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12, pages 611–625. Springer, 2012.
  4. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5410–5418, 2018.
  5. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3435–3444, 2019.
  6. Region separable stereo matching. IEEE Transactions on Multimedia, 2022.
  7. Coatrsnet: Fully exploiting convolution and attention for stereo matching by region separation. International Journal of Computer Vision, pages 1–18, 2023.
  8. Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems, 33:22158–22169, 2020.
  9. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  10. Mc-stereo: Multi-peak lookup and cascade search range for stereo matching. arXiv preprint arXiv:2311.02340, 2023.
  11. Frequency separation for real-world super-resolution. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3599–3608. IEEE, 2019.
  12. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
  13. Group-wise correlation stereo network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3273–3282, 2019.
  14. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision, pages 66–75, 2017.
  15. Practical stereo matching via cascaded recurrent network with adaptive correlation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16263–16272, 2022.
  16. Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV), pages 218–227. IEEE, 2021.
  17. Local similarity pattern and cost self-reassembling for deep stereo matching networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1647–1655, 2022.
  18. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  19. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
  20. Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3061–3070, 2015.
  21. Fast vision transformers with hilo attention. Advances in Neural Information Processing Systems, 35:14541–14554, 2022.
  22. High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2-5, 2014, Proceedings 36, pages 31–42. Springer, 2014.
  23. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3260–3269, 2017.
  24. Pcw-net: Pyramid combination and warping cost volume for stereo matching. In European Conference on Computer Vision, pages 280–297. Springer, 2022.
  25. Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14362–14372, 2021.
  26. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419. Springer, 2020.
  27. Falling things: A synthetic dataset for 3d object detection and pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 2038–2041, 2018.
  28. Tartanair: A dataset to push the limits of visual slam. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4909–4916. IEEE, 2020.
  29. Croco v2: Improved cross-view completion pre-training for stereo matching and optical flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17969–17980, 2023.
  30. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  31. Attention concatenation volume for accurate and efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12981–12990, 2022.
  32. Memory-efficient optical flow via radius-distribution orthogonal cost volume. arXiv preprint arXiv:2312.03790, 2023a.
  33. Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21919–21928, 2023b.
  34. Accurate and efficient stereo matching via attention concatenation volume. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023c.
  35. Cgi-stereo: Accurate and real-time stereo matching via context and geometry interaction. arXiv preprint arXiv:2301.02789, 2023d.
  36. Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1959–1968, 2020.
  37. Unifying flow, stereo and depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023e.
  38. Learning in the frequency domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1740–1749, 2020.
  39. Hierarchical deep stereo matching on high-resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5515–5524, 2019.
  40. Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 185–194, 2019.
  41. Adaptive unimodal cost volume filtering for deep stereo matching. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12926–12934, 2020.
  42. Eai-stereo: Error aware iterative network for stereo matching. In Proceedings of the Asian Conference on Computer Vision, pages 315–332, 2022.
  43. High-frequency stereo matching network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1327–1336, 2023.
Citations (15)

Summary

We haven't generated a summary for this paper yet.