GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo (2404.07992v1)
Abstract: Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this paper, we propose GoMVS to aggregate geometrically consistent costs, yielding better utilization of adjacent geometries. More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals. We achieve this by the geometric consistent propagation (GCP) module. It computes the correspondence from the adjacent depth hypothesis space to the reference depth space using surface normals, then uses the correspondence to propagate adjacent costs to the reference geometry, followed by a convolution for aggregation. Our method achieves new state-of-the-art performance on DTU, Tanks & Temple, and ETH3D datasets. Notably, our method ranks 1st on the Tanks & Temple Advanced benchmark.
- Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120:153–168, 2016.
- Mvsformer: Learning robust image representations via transformers and temperature-based depth for multi-view stereo. arXiv preprint arXiv:2208.02541, 2022.
- Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2524–2534, 2020.
- Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8585–8594, 2022.
- Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021.
- Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, pages 873–881, 2015.
- Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2020.
- Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
- Normal assisted stereo depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2189–2199, 2020.
- Nr-mvsnet: Learning multi-view stereo based on normal consistency and depth refinement. IEEE Transactions on Image Processing, 2023a.
- Learning to fuse monocular and multi-view cues for multi-frame depth estimation in dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21539–21548, 2023b.
- Wt-mvsnet: window-based transformers for multi-view stereo. Advances in Neural Information Processing Systems, 35:8564–8576, 2022.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- When epipolar constraint meets non-local operators in multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18088–18097, 2023.
- Occlusion-aware depth estimation with adaptive normal constraints. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16, pages 640–657. Springer, 2020.
- Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5732–5740, 2021.
- Generalized binary search network for highly-efficient multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12991–13000, 2022.
- Rethinking depth estimation for multi-view stereo: A unified representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8645–8654, 2022.
- Geonet: Geometric neural network for joint depth and surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 283–291, 2018.
- Hierarchical prior mining for non-local multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3611–3620, 2023.
- Pixelwise view selection for unstructured multi-view stereo. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pages 501–518. Springer, 2016.
- A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Efficient edge-preserving multi-view stereo network for depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2348–2356, 2023.
- Normal assisted pixel-visibility learning with cost aggregation for multiview stereo. IEEE Transactions on Intelligent Transportation Systems, 23(12):24686–24697, 2022.
- Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14194–14203, 2021.
- Itermvs: Iterative probability estimation for efficient multi-view stereo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8606–8615, 2022a.
- Mvster: epipolar transformer for efficient multi-view stereo. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI, pages 573–591. Springer, 2022b.
- Spnet: Learning stereo matching with slanted plane aggregation. IEEE Robotics and Automation Letters, 7(3):6258–6265, 2022c.
- Aa-rmvsnet: Adaptive aggregation recurrent multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6187–6196, 2021.
- Boosting multi-view stereo with late cost aggregation. arXiv preprint arXiv:2401.11751, 2024.
- Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1959–1968, 2020.
- Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5483–5492, 2019.
- Planar prior assisted patchmatch multi-view stereo. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12516–12523, 2020.
- Learning inverse depth regression for pixelwise visibility-aware multi-view stereo networks. International Journal of Computer Vision, 130(8):2040–2059, 2022.
- Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV, pages 674–689. Springer, 2020.
- Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4877–4886, 2020.
- Non-parametric depth distribution modelling based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8626–8634, 2022.
- Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
- Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5525–5534, 2019.
- Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1790–1799, 2020.
- Constraining depth map geometry for multi-view stereo: A dual-depth approach with saddle-shaped depth cells. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17661–17670, 2023.
- Enforcing geometric constraints of virtual normal for depth prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5684–5693, 2019.
- Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35:25018–25032, 2022.
- Vis-mvsnet: Visibility-aware multi-view stereo network. International Journal of Computer Vision, 131(1):199–214, 2023a.
- Multi-view stereo representation revist: Region-aware mvsnet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17376–17385, 2023b.
- Geomvsnet: Learning multi-view stereo with geometry perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21508–21518, 2023c.