Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo (2309.17218v1)

Published 29 Sep 2023 in cs.CV

Abstract: Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines. Our idea takes inspiration from the classic epipolar geometry, which shows that one point with different depth hypotheses will be projected to the epipolar line on the other view. This constraint reduces the 2D search space into the epipolar line in stereo matching. Similarly, this suggests that the matching of MVS is to distinguish a series of points lying on the same line. Inspired by this point-to-line search, we devise a line-to-point non-local augmentation strategy. We first devise an optimized searching algorithm to split the 2D feature maps into epipolar line pairs. Then, an Epipolar Transformer (ET) performs non-local feature augmentation among epipolar line pairs. We incorporate the ET into a learning-based MVS baseline, named ET-MVSNet. ET-MVSNet achieves state-of-the-art reconstruction performance on both the DTU and Tanks-and-Temples benchmark with high efficiency. Code is available at https://github.com/TQTQliu/ET-MVSNet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis., 120:153–168, 2016.
  2. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  3. Patchmatch stereo-stereo matching with slanted support windows. In Proc. Br. Mach. Vis. Conf., volume 11, pages 1–11, 2011.
  4. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  5. Point-based multi-view stereo network. In Proc. IEEE Int. Conf. Comput. Vis., pages 1538–1547, 2019.
  6. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 2524–2534, 2020.
  7. A maximum likelihood stereo algorithm. Comput. Vis. Image Understanding, 63(3):542–567, 1996.
  8. Deformable convolutional networks. In Proc. IEEE Int. Conf. Comput. Vis., pages 764–773, 2017.
  9. Transmvsnet global context-aware multi-view stereo network with transformers. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 8585–8594, 2022.
  10. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 12124–12134, 2022.
  11. Object-centered surface reconstruction combining multi-image stereo and shading. Int. J. Comput. Vis., 16(ARTICLE):35–56, 1995.
  12. Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision, 9(1-2):1–148, 2015.
  13. A compact algorithm for rectification of stereo pairs. Machine vision and applications, 12:16–22, 2000.
  14. Massively parallel multiview stereopsis by surface normal diffusion. In Proc. IEEE Int. Conf. Comput. Vis., pages 873–881, 2015.
  15. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 2495–2504, 2020.
  16. Heiko Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell., 30(2):328–341, 2007.
  17. Surfacenet an end-to-end 3d neural network for multiview stereopsis. In Proc. IEEE Int. Conf. Comput. Vis., pages 2307–2315, 2017.
  18. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proc. Int. Conf. Mach. Learn., pages 5156–5165. PMLR, 2020.
  19. Tanks and temples benchmarking large-scale scene reconstruction. ACM Trans. Graph., 36(4):1–13, 2017.
  20. Feature pyramid networks for object detection. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 2117–2125, 2017.
  21. P-mvsnet learning patch-wise matching confidence aggregation for multi-view stereo. In Proc. IEEE Int. Conf. Comput. Vis., pages 10452–10461, 2019.
  22. Understanding the effective receptive field in deep convolutional neural networks. Proc. Adv. Neural Inf. Process. Syst., 29, 2016.
  23. Epp-mvsnet epipolar-assembling based depth prediction for multi-view stereo. In Proc. IEEE Int. Conf. Comput. Vis., pages 5732–5740, 2021.
  24. Generalized binary search network for highly-efficient multi-view stereo. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 12991–13000, 2022.
  25. Rethinking depth estimation for multi-view stereo a unified representation. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 8645–8654, 2022.
  26. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis., 47:7–42, 2002.
  27. Structure-from-motion revisited. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 4104–4113, 2016.
  28. Pixelwise view selection for unstructured multi-view stereo. In Proc. Eur. Conf. Comput. Vis., pages 501–518. Springer, 2016.
  29. Attention is all you need. Proc. Adv. Neural Inf. Process. Syst., 30, 2017.
  30. Patchmatchnet learned multi-view patchmatch stereo. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 14194–14203, 2021.
  31. Mvster epipolar transformer for efficient multi-view stereo. In Proc. Eur. Conf. Comput. Vis., pages 573–591. Springer, 2022.
  32. Aa-rmvsnet adaptive aggregation recurrent multi-view stereo network. In Proc. IEEE Int. Conf. Comput. Vis., pages 6187–6196, 2021.
  33. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Proc. Eur. Conf. Comput. Vis., pages 674–689. Springer, 2020.
  34. Cost volume pyramid based depth inference for multi-view stereo. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 4877–4886, 2020.
  35. Mvsnet depth inference for unstructured multi-view stereo. In Proc. Eur. Conf. Comput. Vis., pages 767–783, 2018.
  36. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 5525–5534, 2019.
  37. Blendedmvs a large-scale dataset for generalized multi-view stereo networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recogn., pages 1790–1799, 2020.
  38. Pyramid multi-view stereo net with self-adaptive view aggregation. In Proc. Eur. Conf. Comput. Vis., pages 766–782. Springer, 2020.
  39. Visibility-aware multi-view stereo network. arXiv preprint arXiv:2008.07928, 2020.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com