Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Learning for Multi-view Stereo Reconstruction (2404.05181v1)

Published 8 Apr 2024 in cs.CV

Abstract: Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120(2):153–168, 2016.
  2. Point-based multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1538–1547, 2019.
  3. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2524–2534, 2020.
  4. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2235–2245, 2018.
  5. Wasserstein distances for stereo disparity estimation. In NeurIPS, 2020.
  6. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.
  7. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495–2504, 2020.
  8. Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028, 2017.
  9. Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision, pages 2307–2315, 2017.
  10. Learning a multi-view stereo machine. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 364–375, 2017.
  11. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, pages 66–75, 2017.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  14. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
  15. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10452–10461, 2019.
  16. Attention-aware multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1590–1599, 2020.
  17. Real-time visibility-based fusion of depth maps. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8. IEEE, 2007.
  18. Openmvg: Open multiple view geometry. In International Workshop on Reproducible Research in Pattern Recognition, pages 60–74. Springer, 2016.
  19. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4903–4911, 2017.
  20. Automatic differentiation in pytorch. 2017.
  21. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  22. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision, pages 501–518. Springer, 2016.
  23. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558, 2017.
  24. Cédric Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
  25. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  26. Adaptive wing loss for robust face alignment via heatmap regression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6971–6981, 2019.
  27. Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5483–5492, 2019.
  28. Learning inverse depth regression for multi-view stereo with correlation cost volume. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12508–12515, 2020.
  29. Mvscrf: Learning multi-view stereo with conditional random fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4312–4321, 2019.
  30. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In European Conference on Computer Vision, pages 674–689. Springer, 2020.
  31. Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4877–4886, 2020.
  32. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), pages 767–783, 2018.
  33. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5525–5534, 2019.
  34. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1790–1799, 2020.
  35. Pyramid multi-view stereo net with self-adaptive view aggregation. In European Conference on Computer Vision, pages 766–782. Springer, 2020.
  36. Center-based 3d object detection and tracking. 2021.
  37. Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1949–1958, 2020.
  38. Visibility-aware multi-view stereo network. arXiv preprint arXiv:2008.07928, 2020.
  39. Objects as points. In arXiv preprint arXiv:1904.07850, 2019.

Summary

  • The paper introduces an adaptive Wasserstein loss and an offset module to ensure sub-pixel accuracy in depth predictions.
  • It overcomes limitations in regression and classification losses by aligning predicted and true depth distributions even with non-overlapping supports.
  • Empirical tests on DTU, Tanks and Temples, and BlendedMVS benchmarks demonstrate state-of-the-art performance and scalability.

Adaptive Learning for Multi-view Stereo Reconstruction Using Adaptive Wasserstein Loss and Offset Module

Introduction

Multi-view stereo (MVS) is crucial for generating dense 3D reconstructions from multiple images. While deep learning (DL)-based approaches have significantly advanced the field, the design of loss functions, a key component in DL models, has been relatively unexplored in deep MVS research. The paper addresses this gap by analyzing existing loss functions and proposing an adaptive Wasserstein loss combined with an offset module. This combination yields state-of-the-art performance on various benchmarks.

Analysis of Existing Loss Functions

Regression-based and Classification-based Loss

Existing deep MVS methods typically employ either regression-based or classification-based loss functions. Regression-based approaches predict a continuous depth value through the mathematical expectation, which can lead to inaccuracies in multi-modal distributions. Classification-based methodologies, on the other hand, produce discretized depth values, hindering the achievement of sub-pixel accuracy.

Novel Contributions

Adaptive Wasserstein Loss

The authors introduce an adaptive Wasserstein loss, facilitating the minimization of divergence between the true and predicted depth distributions, even when they do not share common supports. This loss function is especially effective for deep depth-based MVS, overcoming the limitations of Kullback-Leibler divergence used in classification-based approaches.

Offset Module

Additionally, an offset module is introduced to enhance prediction accuracy to sub-pixel levels. This module operates by predicting both a probability for fixed discrete depth values and an additional offset for each value, thereby resolving issues related to discretized outputs and enabling continuous depth value predictions.

Empirical Evaluation

Benchmarks and Results

The proposed method was rigorously tested across several benchmarks, including DTU, Tanks and Temples, and BlendedMVS datasets. On the DTU dataset, it achieved impressive results, with similar high performance noted on Tanks and Temples. Specifically, it outperformed D2D^2HC-RMVSNet, which shares a similar architecture, thereby highlighting the effectiveness of the proposed loss function and offset module. Moreover, experiments on the BlendedMVS dataset demonstrated the method's scalability and practicality across diverse scenes.

Discussion

Benefits of Adaptive Wasserstein Loss and Offset Module

The adaptive Wasserstein loss addresses the shortcomings of previous loss functions by ensuring the predicted depth distribution closely aligns with the true distribution. The addition of the offset module allows for sub-pixel accuracy in depth predictions, a significant improvement over discretized outputs.

Future Implications

While the proposed method advances the performance of MVS reconstruction, future work could explore the integration of these principles into other 3D vision tasks. Additionally, further refinement of the offset module could enhance its effectiveness across a broader range of scenarios.

Conclusion

The paper presents a novel adaptive Wasserstein loss function combined with an offset module for deep MVS, significantly improving depth prediction accuracy and completeness. Through extensive experimentation, the method demonstrated superior performance on multiple benchmarks, offering promising directions for future research in 3D vision and MVS reconstruction.

X Twitter Logo Streamline Icon: https://streamlinehq.com