Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Digging into contrastive learning for robust depth estimation with diffusion models (2404.09831v4)

Published 15 Apr 2024 in cs.CV

Abstract: Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. Source code and data are available at \url{https://github.com/wangjiyuan9/D4RD}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  2. De-noising of Lidar Point Clouds Corrupted by Snowfall. 254–261. https://doi.org/10.1109/CRV.2018.00043
  3. Diffusiondet: Diffusion model for object detection. arXiv preprint arXiv:2211.09788 (2022).
  4. Xinlei Chen and Kaiming He. 2020. Exploring Simple Siamese Representation Learning. arXiv:2011.10566 [cs.CV]
  5. Diffusiondepth: Diffusion denoising approach for monocular depth estimation. arXiv preprint arXiv:2303.05021 (2023).
  6. David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision. 2650–2658.
  7. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems 27 (2014).
  8. Deep Ordinal Regression Network for Monocular Depth Estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2018.00214
  9. Robust Monocular Depth Estimation under Challenging Conditions. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
  10. Digging Into Self-Supervised Monocular Depth Estimation. arXiv:1806.01260 [cs.CV]
  11. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.LG]
  12. DDP: Diffusion Model for Dense Visual Prediction. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/iccv51070.2023.01987
  13. Tobias Kalb and Jürgen Beyerer. 2023. Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52729.2023.01869
  14. Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  15. Text-image Alignment for Diffusion-based Perception.
  16. RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions. ArXiv abs/2310.15171 (2023). https://api.semanticscholar.org/CorpusID:264436593
  17. Deeper Depth Prediction with Fully Convolutional Residual Networks. arXiv:1606.00373 [cs.CV]
  18. DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Diffusion Model. arXiv preprint arXiv:2311.17456 (2023).
  19. Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation. arXiv:2108.07628 [cs.CV]
  20. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. arXiv:1711.05101 [cs.LG]
  21. Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation. arXiv:2403.05056 [cs.CV]
  22. DiffMatch: Diffusion Model for Dense Matching. arXiv:2305.19094 [cs.CV]
  23. ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation. arXiv preprint arXiv:2403.18807 (2024).
  24. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 12179–12188.
  25. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44, 03 (2020), 1623–1637.
  26. Self-supervised Monocular Depth Estimation: Let’s Talk About The Weather. arXiv:2307.08357 [cs.CV]
  27. The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=jDIlzSU8wJ
  28. MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model. arXiv:2311.07198 [cs.CV]
  29. Denoising Diffusion Implicit Models. arXiv:2010.02502 [cs.LG]
  30. EC-Depth: Exploring the consistency of self-supervised monocular depth estimation in challenging scenes. https://api.semanticscholar.org/CorpusID:268513620
  31. WeatherDepth: Curriculum Contrastive Learning for Self-Supervised Depth Estimation under Adverse Weather Conditions. ArXiv abs/2310.05556 (2023). https://api.semanticscholar.org/CorpusID:263831385
  32. SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process. In NeurIPS.
  33. DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  34. Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52729.2023.01778
  35. Unsupervised Monocular Depth Estimation in Highly Complex Environments. arXiv:2107.13137 [cs.CV]
  36. MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer. In 2022 International Conference on 3D Vision (3DV). IEEE. https://doi.org/10.1109/3dv57658.2022.00077
  37. Unleashing Text-to-Image Diffusion Models for Visual Perception. ICCV (2023).
  38. Unsupervised Learning of Depth and Ego-Motion from Video. arXiv:1704.07813 [cs.CV]
Citations (1)

Summary

We haven't generated a summary for this paper yet.