Papers
Topics
Authors
Recent
2000 character limit reached

FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset Generalization (2401.13786v1)

Published 24 Jan 2024 in cs.CV

Abstract: Wide field-of-view (FoV) cameras efficiently capture large portions of the scene, which makes them attractive in multiple domains, such as automotive and robotics. For such applications, estimating depth from multiple images is a critical task, and therefore, a large amount of ground truth (GT) data is available. Unfortunately, most of the GT data is for pinhole cameras, making it impossible to properly train depth estimation models for large-FoV cameras. We propose the first method to train a stereo depth estimation model on the widely available pinhole data, and to generalize it to data captured with larger FoVs. Our intuition is simple: We warp the training data to a canonical, large-FoV representation and augment it to allow a single network to reason about diverse types of distortions that otherwise would prevent generalization. We show strong generalization ability of our approach on both indoor and outdoor datasets, which was not possible with previous methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017.
  2. MVSFormer: Multi-view stereo by learning robust image features and temperature-based depth. Transactions of Machine Learning Research, 2023.
  3. Matterport3D: Learning from RGB-D data in indoor environments. In International Conference on 3D Vision, 2017.
  4. Cube padding for weakly-supervised saliency prediction in 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  5. Deep stereo using adaptive thin volume representation with uncertainty awareness. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  6. 360MVSNet: Deep multi-view stereo network with 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT images for indoor scene reconstruction. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2023.
  7. Spherical CNNs. In International Conference on Learning Representations (ICLR), 2018.
  8. SphereNet: Learning spherical representations for detection and classification in omnidirectional images. In European Conference on Computer Vision (ECCV), 2018.
  9. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  10. TransMVSNet: Global context-aware multi-view stereo network with transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  11. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems (NeurIPS), 2014.
  12. Learning SO(3) equivariant representations with spherical CNNs. In European Conference on Computer Vision (ECCV), 2018.
  13. Multi-view stereo: A tutorial. Found. Trends. Comput. Graph. Vis., 2015.
  14. Review on panoramic imaging and its applications in scene understanding. IEEE Transactions on Instrumentation and Measurement, 2022.
  15. A general imaging model and a method for finding its parameters. In IEEE International Conference on Computer Vision (ICCV), 2021.
  16. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  17. 3D packing for self-supervised monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  18. Group-wise correlation stereo network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  19. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  20. DeepMVS: Learning multi-view stereopsis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  21. End-to-end learning of geometry and context for deep stereo regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  22. 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT depth estimation from multiple fisheye images with origami crown representation of icosahedron. In International Conference on Intelligent Robots and Systems (IROS), 2020.
  23. Normal assisted stereo depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  24. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics (ToG), 2020.
  25. Semi-supervised 360 depth estimation from multiple fisheye cameras with pixel-level selective loss. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  26. Omnidirectional stereo depth estimation based on spherical deep network. Image and Vision Computing, 2021.
  27. MODE: Multi-view omnidirectional depth estimation with 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT cameras. In European Conference on Computer Vision (ECCV), 2022a.
  28. Panoramic stereo matching network based on bi-projection fusion. In China Automation Congress (CAC), 2022b.
  29. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2D and 3D. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
  30. Generalized binary search network for highly-efficient multi-view stereo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  31. A unifying model for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.
  32. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
  33. 360MonoDepth: High-resolution 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  34. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision (IJCV), 2002.
  35. Structure-from-motion revisited. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  36. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
  37. Kernel transformer networks for compact spherical convolution. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  38. Distortion-aware convolutional filters for dense prediction in panoramic images. In European Conference on Computer Vision (ECCV), 2018.
  39. BiFuse: Monocular 360 depth estimation via bi-projection fusion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  40. 360SD-Net: 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT stereo depth estimation with learnable cost volume. arXiv preprint arXiv:1911.04460, 2019.
  41. Omnidirectional CNN for visual place recognition and navigation. In International Conference on Robotics and Automation (ICRA), 2018.
  42. Multi-view stereo in the deep learning era: A comprehensive review. Displays, 2021.
  43. MVSTER: Epipolar transformer for efficient multi-view stereo. In European Conference on Computer Vision (ECCV), 2022.
  44. OmniMVS: End-to-end learning for omnidirectional stereo matching. IEEE International Conference on Computer Vision (ICCV), 2019a.
  45. SweepNet: Wide-baseline omnidirectional depth estimation. International Conference on Robotics and Automation (ICRA), 2019b.
  46. End-to-end learning for omnidirectional stereo matching with uncertainty prior. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
  47. Cost volume pyramid based depth inference for multi-view stereo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  48. MVSNet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2018.
  49. Qingtian Zhu. Deep learning for multi-view stereo via plane sweep: A survey. arXiv preprint arXiv:2106.15328, 2021.
  50. OmniDepth: Dense depth estimation for indoors spherical panoramas. In European Conference on Computer Vision (ECCV), 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.