Neural Markov Random Field for Stereo Matching (2403.11193v2)
Abstract: Stereo matching is a core task for many computer vision and robotics applications. Despite their dominance in traditional stereo methods, the hand-crafted Markov Random Field (MRF) models lack sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of the MRF models, the overall accuracy is still severely limited by the hand-crafted pairwise terms and message passing. To address these issues, we propose a neural MRF model, where both potential functions and message passing are designed using data-driven neural networks. Our fully data-driven model is built on the foundation of variational inference theory, to prevent convergence issues and retain stereo MRF's graph inductive bias. To make the inference tractable and scale well to high-resolution images, we also propose a Disparity Proposal Network (DPN) to adaptively prune the search space of disparity. The proposed approach ranks $1{st}$ on both KITTI 2012 and 2015 leaderboards among all published methods while running faster than 100 ms. This approach significantly outperforms prior global methods, e.g., lowering D1 metric by more than 50% on KITTI 2015. In addition, our method exhibits strong cross-domain generalization and can recover sharp edges. The codes at https://github.com/aeolusguan/NMRF
- Multiway cut for stereo and motion with slanted surfaces. In Proceedings of the seventh IEEE international conference on computer vision, pages 489–495. IEEE, 1999.
- Patchmatch stereo-stereo matching with slanted support windows. In Bmvc, pages 1–11, 2011.
- Learning a neural solver for multiple object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6247–6257, 2020.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Stereodrnet: Dilated residual stereonet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11786–11795, 2019.
- Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5410–5418, 2018.
- On the over-smoothing problem of cnn based disparity estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8997–9005, 2019.
- A deep visual correspondence embedding model for stereo matching costs. In Proceedings of the IEEE International Conference on Computer Vision, pages 972–980, 2015.
- Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2524–2534, 2020a.
- Learning depth with convolutional spatial propagation network. IEEE transactions on pattern analysis and machine intelligence, 2019.
- Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems, 33, 2020b.
- Discriminative embeddings of latent variable models for structured data. In International conference on machine learning, pages 2702–2711. PMLR, 2016.
- Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12124–12134, 2022.
- Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4384–4393, 2019.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Group-wise correlation stereo network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3273–3282, 2019.
- William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.
- Matchnet: Unifying feature and metric learning for patch-based matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3279–3286, 2015.
- Heiko Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence, 30(2):328–341, 2007.
- Uncertainty guided adaptive warping for robust and efficient stereo matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3318–3327, 2023.
- End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision, pages 66–75, 2017.
- Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. In Proceedings of the European Conference on Computer Vision (ECCV), pages 573–590, 2018.
- End-to-end training of hybrid cnn-crf models for stereo. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2339–2348, 2017.
- Belief propagation reloaded: Learning bp-layers for labeling problems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7900–7909, 2020.
- A survey on deep learning techniques for stereo-based depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):1738–1764, 2020.
- Practical stereo matching via cascaded recurrent network with adaptive correlation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16263–16272, 2022.
- Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6197–6206, 2021.
- Learning for disparity estimation through feature constancy. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2811–2820, 2018.
- Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV), pages 218–227. IEEE, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Efficient deep learning for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5695–5703, 2016.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
- Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3061–3070, 2015.
- Discrete optimization for optical flow. In Pattern Recognition: 37th German Conference, GCPR 2015, Aachen, Germany, October 7-10, 2015, Proceedings 37, pages 16–28. Springer, 2015.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47:7–42, 2002.
- High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2-5, 2014, Proceedings 36, pages 31–42. Springer, 2014.
- A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3260–3269, 2017.
- Patch based confidence prediction for dense disparity map. In BMVC, page 4, 2016.
- Sgm-nets: Semi-global matching with neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 231–240, 2017.
- Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13906–13915, 2021.
- Pcw-net: Pyramid combination and warping cost volume for stereo matching. In European Conference on Computer Vision, pages 280–297. Springer, 2022.
- Deep stereo matching with dense crf priors. arXiv preprint arXiv:1612.01725, 2016.
- Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8922–8931, 2021.
- A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE transactions on pattern analysis and machine intelligence, 30(6):1068–1080, 2008.
- Continuous 3D Label Stereo Matching using Local Expansion Moves. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(11):2725–2739, 2018.
- Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14362–14372, 2021.
- Large-scale outdoor scene reconstruction and correction with vision. The International Journal of Robotics Research, 41(6):637–663, 2022.
- Tappen. Comparison of graph cuts with belief propagation for stereo, using identical mrf parameters. In Proceedings Ninth IEEE International Conference on Computer Vision, pages 900–906. IEEE, 2003.
- Smd-nets: Stereo mixture density networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8942–8952, 2021.
- Practical deep stereo (pds): Toward applications-friendly deep stereo matching. Advances in neural information processing systems, 31, 2018.
- Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8445–8453, 2019.
- Attention concatenation volume for accurate and efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12981–12990, 2022.
- Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21919–21928, 2023.
- Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1959–1968, 2020.
- Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4353–4361, 2015.
- Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res., 17(1):2287–2318, 2016.
- Dense stereo matching with application to augmented reality. In 2007 IEEE International Conference on Signal Processing and Communications, pages 1503–1506. IEEE, 2007.
- Meshstereo: A global stereo model with mesh alignment regularization for view interpolation. In Proceedings of the IEEE International Conference on Computer Vision, pages 2057–2065, 2015.
- Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 185–194, 2019.
- Domain-invariant stereo matching networks. In Europe Conference on Computer Vision (ECCV), 2020.
- Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021.