Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion (2404.00373v1)

Published 30 Mar 2024 in cs.CV

Abstract: This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image. We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details. Driven by this analysis, we propose to explicitly employ the image edges as input for ECFNet and fuse the initial depths from different sources to produce the final depth. Specifically, ECFNet first uses a hybrid edge detection strategy to get the edge map and edge-highlighted image from the input image, and then leverages a pre-trained MDE network to infer the initial depths of the aforementioned three images. After that, ECFNet utilizes a layered fusion module (LFM) to fuse the initial depth, which will be further updated by a depth consistency module (DCM) to form the final estimation. Extensive experimental results on public datasets and ablation studies indicate that our method achieves state-of-the-art performance. Project page: https://zrealli.github.io/edgedepth.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Adabins: Depth estimation using adaptive bins. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2021.
  2. John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6):679–698, 1986.
  3. Improving monocular depth estimation by leveraging structural awareness and complementary datasets. In Proc. Eur. Conf. Comp. Vis., 2020a.
  4. Oasis: A large-scale dataset for single image 3d in the wild. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2020b.
  5. Multi-resolution monocular depth map fusion by self-supervised gradient-based composition. Proc. AAAI Conf. Artificial Intell., 2023.
  6. Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Representations, 2015.
  7. How do neural networks see depth in single images? In Proc. IEEE Int. Conf. Comp. Vis., 2019.
  8. Depth map prediction from a single image using a multi-scale deep network. In Proc. Advances in Neural Inf. Process. Syst., 2014.
  9. Deep ordinal regression network for monocular depth estimation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
  10. Unsupervised monocular depth estimation with left-right consistency. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2017.
  11. Bi-directional cascade network for perceptual edge detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019.
  12. Guided image filtering. In European conference on computer vision, pages 1–14. Springer, 2010.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  14. Visualization of convolutional neural networks for monocular depth estimation. In Proc. IEEE Int. Conf. Comp. Vis., 2019.
  15. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2017.
  16. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2020.
  17. Deep monocular depth estimation via integration of global and local predictions. IEEE transactions on Image Processing, 2018.
  18. Josef Kittler. On the accuracy of the sobel edge detector. Image and Vision Computing, 1(1):37–42, 1983.
  19. Evaluation of cnn-based single-image depth estimation methods. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
  20. Learning blind video temporal consistency. In Proc. Eur. Conf. Comp. Vis., 2018.
  21. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. 3D. Vis., 2016.
  22. Multi-loss rebalancing algorithm for monocular depth estimation. In Proc. Eur. Conf. Comp. Vis., 2020.
  23. Unsupervised monocular depth learning in dynamic scenes. Proc. CoRL, 2020.
  24. Efficient temporal denoising for improved depth map applications. In Proc. Int. Conf. Learn. Representations, Tiny papers, 2023.
  25. Towards practical consistent video depth estimation. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, pages 388–397, 2023.
  26. Megadepth: Learning single-view depth prediction from internet photos. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2018.
  27. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  28. Consistent video depth estimation. ACM Trans. Graph., 39(4), 2020.
  29. Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2021.
  30. 3D Ken Burns effect from a single image. ACM Trans. Graph., 38(6):1–15, 2019.
  31. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  32. Automatic differentiation in pytorch. 2017.
  33. On the uncertainty of self-supervised monocular depth estimation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2020.
  34. Edter: Edge detection with transformer. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2022.
  35. Pixel-pair occlusion relationship map (p2orm): formulation, inference and application. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 690–708. Springer, 2020.
  36. Predicting sharp and accurate occlusion boundaries in monocular depth estimation using displacement fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14648–14657, 2020.
  37. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  38. Vision transformers for dense prediction. Proc. IEEE Int. Conf. Comp. Vis., 2021.
  39. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022.
  40. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  41. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015.
  42. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494, 2022.
  43. 3d photography using context-aware layered depth inpainting. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 8028–8038, 2020.
  44. Indoor segmentation and support inference from rgbd images. Proc. Eur. Conf. Comp. Vis., 2012.
  45. Pixel difference networks for efficient edge detection. In Proc. IEEE Int. Conf. Comp. Vis., 2021.
  46. Mind the edge: Refining depth edges in sparsely-supervised monocular depth estimation. arXiv preprint arXiv:2212.05315, 2022.
  47. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6924–6932, 2017.
  48. Demon: Depth and motion network for learning monocular stereo. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2017.
  49. Diode: A dense indoor and outdoor depth dataset. arXiv preprint arXiv:1908.00463, 2019.
  50. Gated2gated: Self-supervised depth estimation from gated images. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2022.
  51. Web stereo video supervision for depth prediction from dynamic scenes. In Int. Conf. 3D. Vis., 2019.
  52. Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 541–550, 2020.
  53. Bilateral cyclic constraint and adaptive regularization for unsupervised monocular depth prediction. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2019.
  54. Toward practical monocular indoor depth estimation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2022.
  55. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  56. Structure-guided ranking loss for single image depth prediction. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2020.
  57. Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In Proc. AAAI Conf. Artificial Intell., 2018.
  58. Virtual normal: Enforcing geometric constraints for accurate and robust depth prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021a.
  59. Learning to recover 3d scene shape from a single image. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2021b.
  60. Learning enriched features for real image restoration and enhancement. In Proc. Eur. Conf. Comp. Vis., 2020.
  61. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  62. The Edge of Depth: Explicit constraints between segmentation and depth. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com