Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving (2403.07535v1)

Published 12 Mar 2024 in cs.CV

Abstract: Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To address this challenge, we propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results for both robust and accurate depth estimations. The adaptive fusion module performs fusion by dynamically selecting high-confidence regions between two branches based on a wrapping confidence map. Thus, the system tends to choose the more reliable branch when facing textureless scenes, inaccurate calibration, dynamic objects, and other degradation or challenging conditions. Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing. Furthermore, we achieve state-of-the-art performance on challenging benchmarks (KITTI and DDAD) when given accurate pose estimations. Project website: https://github.com/Junda24/AFNet/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Multi-view depth estimation by fusing single-view depth probability with multi-view geometry. In CVPR, 2022.
  2. Adabins: Depth estimation using adaptive bins. In CVPR, 2021.
  3. Point-based multi-view stereo network. In ICCV, 2019.
  4. Region separable stereo matching. IEEE Transactions on Multimedia, 2022.
  5. Coatrsnet: Fully exploiting convolution and attention for stereo matching by region separation. International Journal of Computer Vision, 132(1):56–73, 2024.
  6. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
  7. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In CVPR, pages 10786–10796, 2021.
  8. Depth map prediction from a single image using a multi-scale deep network. NeurIPS, 2014.
  9. Single-view and multi-view depth fusion. RA-L, 2017.
  10. Deep ordinal regression network for monocular depth estimation. In CVPR, 2018.
  11. Vision meets robotics: The kitti dataset. IJRR, 2013.
  12. Digging into self-supervised monocular depth estimation. In ICCV, 2019.
  13. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In CVPR, 2020.
  14. Detail preserving depth estimation from a single image using attention guided networks. In 3DV, 2018.
  15. Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In WACV, 2019.
  16. Guiding monocular depth estimation using depth-attention volume. In ECCV, 2020.
  17. Posenet: A convolutional network for real-time 6-dof camera relocalization. In ICCV, 2015.
  18. Deeper depth prediction with fully convolutional residual networks. In 3DV, 2016.
  19. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326, 2019.
  20. Learning to fuse monocular and multi-view cues for multi-frame depth estimation in dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21539–21548, 2023a.
  21. Learning depth via leveraging semantics: Self-supervised monocular depth estimation with both implicit and explicit semantic guidance. Pattern Recognition, 137:109297, 2023b.
  22. A convnet for the 2020s. In CVPR, 2022.
  23. Multi-view depth estimation using epipolar spatio-temporal networks. In CVPR, 2021.
  24. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  25. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics, 33(5):1255–1262, 2017.
  26. Multi-level context ultra-aggregation for stereo matching. In CVPR, 2019.
  27. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  28. Geonet: Geometric neural network for joint depth and surface normal estimation. In CVPR, 2018.
  29. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
  30. Feature-metric loss for self-supervised learning of depth and egomotion. In ECCV, 2020.
  31. Super-convergence: Very fast training of residual networks using large learning rates. 2018.
  32. Deepv2d: Video to depth with differentiable structure from motion. arXiv preprint arXiv:1812.04605, 2018.
  33. Patchmatchnet: Learned multi-view patchmatch stereo. In CVPR, 2021.
  34. Itermvs: Iterative probability estimation for efficient multi-view stereo. In CVPR, 2022.
  35. Monorec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6112–6122, 2021.
  36. Spatial correspondence with generative adversarial network: Learning depth from monocular videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7494–7504, 2019.
  37. Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In CVPR, 2017.
  38. Structured attention guided convolutional neural fields for monocular depth estimation. In CVPR, 2018.
  39. Attention concatenation volume for accurate and efficient stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12981–12990, 2022.
  40. Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21919–21928, 2023a.
  41. Accurate and efficient stereo matching via attention concatenation volume. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
  42. Cgi-stereo: Accurate and real-time stereo matching via context and geometry interaction. arXiv preprint arXiv:2301.02789, 2023c.
  43. Mvs2d: Efficient multi-view stereo via attention-driven 2d convolutions. In CVPR, 2022.
  44. Mvsnet: Depth inference for unstructured multi-view stereo. In ECCV, 2018.
  45. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In CVPR, 2019.
  46. Enforcing geometric constraints of virtual normal for depth prediction. In ICCV, 2019.
  47. Diversedepth: Affine-invariant depth prediction using diverse data. page 2002.00569, 2020.
  48. Virtual normal: Enforcing geometric constraints for accurate and robust depth prediction. 2021.
  49. Towards accurate reconstruction of 3d scene shape from a single monocular image. 2022.
  50. Metric3d: Towards zero-shot metric 3d prediction from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9043–9053, 2023.
  51. Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In CVPR, 2020.
  52. New crfs: Neural window fully-connected crfs for monocular depth estimation. arXiv preprint arXiv:2203.01502, 2022.
  53. Computing the stereo matching cost with a convolutional neural network. In CVPR, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wei Yin (57 papers)
  2. Kaixuan Wang (24 papers)
  3. Xiaozhi Chen (18 papers)
  4. Shijie Wang (62 papers)
  5. Xin Yang (314 papers)
  6. Junda Cheng (13 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.