GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo (2310.19583v3)
Abstract: Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.
- Using multiple hypotheses to improve depth-maps for multi-view stereo. In European Conference on Computer Vision, 2008.
- Point-based multi-view stereo network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1538–1547, 2019.
- Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2524–2534, 2020.
- Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
- Deep multi-view stereo gone wild. In 2021 International Conference on 3D Vision (3DV), pages 484–493. IEEE, 2021.
- Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8585–8594, 2022.
- Variational principles, surface evolution, pdes, level set methods, and the stereo problem. IEEE Transactions on Image Processing, 7(3):336–344, 1998.
- Object-centered surface reconstruction: Combining multi-image stereo and shading. International Journal of Computer Vision, 16(ARTICLE):35–56, 1995.
- Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:1362–1376, 2010.
- Massively parallel multiview stereopsis by surface normal diffusion. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 873–881, 2015.
- CURVATURE-GUIDED DYNAMIC SCALE NETWORKS FOR MULTI-VIEW STEREO. In International Conference on Learning Representations, 2022.
- Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2020.
- Multiple View Geometry in Computer Vision. Cambridge University Press, New York, NY, USA, 2 edition, 2003.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
- Large scale multi-view stereopsis evaluation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 406–413. IEEE, 2014.
- Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2326–2334, 2017.
- Handling occlusions in dense multi-view stereo. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I, 2001.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR, 2020.
- Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
- Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 2017.
- A theory of shape by space carving. In Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 1, pages 307–314 vol.1, 1999.
- A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):418–433, 2005.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10451–10460, 2019.
- Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5712–5720, 2021.
- Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5732–5740, 2021.
- Generalized binary search network for highly-efficient multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12991–13000, 2022.
- Occlusion detectable stereo-occlusion patterns in camera matrix. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 371–378, 1996.
- Rethinking depth estimation for multi-view stereo: A unified representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Weight standardization. arXiv preprint arXiv:1903.10520, 2019.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Pixelwise view selection for unstructured multi-view stereo. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 501–518, Cham, 2016. Springer International Publishing.
- Photorealistic scene reconstruction by voxel coloring. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1067–1073, 1997.
- Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8, 2007.
- Efficient large scale multi-view stereo for ultra high resolution image sets. Machine Vision and Applications, 23, 09 2011.
- Real-time self-adaptive deep stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 195–204, 2019.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Mvster: Epipolar transformer for efficient multi-view stereo. In European Conference on Computer Vision, pages 573–591. Springer, 2022.
- Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA), pages 5893–5900. IEEE, 2019.
- Aa-rmvsnet: Adaptive aggregation recurrent multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6187–6196, 2021.
- Highres-mvsnet: A fast multi-view stereo network for dense 3d reconstruction from high-resolution images. IEEE Access, 9:11306–11315, 2021.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Non-local recurrent regularization networks for multi-view stereo. CoRR, abs/2110.06436, 2021.
- Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5483–5492, 2019.
- Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In European conference on computer vision, pages 674–689. Springer, 2020.
- Cost volume pyramid based depth inference for multi-view stereo. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), pages 767–783, 2018.
- Recurrent mvsnet for high-resolution multi-view stereo depth inference. Computer Vision and Pattern Recognition (CVPR), 2019.
- Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020.
- Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6044–6053, 2019.
- Attention aware cost volume pyramid based multi-view stereo network for 3d reconstruction. ISPRS Journal of Photogrammetry and Remote Sensing, 175:448–460, 2021.
- Recovering consistent video depth maps via bundle optimization. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008.
- Visibility-aware multi-view stereo network. British Machine Vision Conference (BMVC), 2020.
- Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9308–9316, 2019.