Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dusk Till Dawn: Self-supervised Nighttime Stereo Depth Estimation using Visual Foundation Models (2405.11158v1)

Published 18 May 2024 in cs.CV and cs.RO

Abstract: Self-supervised depth estimation algorithms rely heavily on frame-warping relationships, exhibiting substantial performance degradation when applied in challenging circumstances, such as low-visibility and nighttime scenarios with varying illumination conditions. Addressing this challenge, we introduce an algorithm designed to achieve accurate self-supervised stereo depth estimation focusing on nighttime conditions. Specifically, we use pretrained visual foundation models to extract generalised features across challenging scenes and present an efficient method for matching and integrating these features from stereo frames. Moreover, to prevent pixels violating photometric consistency assumption from negatively affecting the depth predictions, we propose a novel masking approach designed to filter out such pixels. Lastly, addressing weaknesses in the evaluation of current depth estimation algorithms, we present novel evaluation metrics. Our experiments, conducted on challenging datasets including Oxford RobotCar and Multi-Spectral Stereo, demonstrate the robust improvements realized by our approach. Code is available at: https://github.com/madhubabuv/dtd

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 3, pp. 1623–1637, 2020.
  2. R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 179–12 188.
  3. C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270–279.
  4. C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3828–3838.
  5. Y. Wang, Y. Liang, H. Xu, S. Jiao, and H. Yu, “Sqldepth: Generalizable self-supervised fine-structured monocular depth estimation,” arXiv preprint arXiv:2309.00526, 2023.
  6. R. Peng, R. Wang, Y. Lai, L. Tang, and Y. Cai, “Excavating the potential capacity of self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 560–15 569.
  7. K. Zhou, L. Hong, C. Chen, H. Xu, C. Ye, Q. Hu, and Z. Li, “Devnet: Self-supervised monocular depth learning via density volume construction,” in European Conference on Computer Vision.   Springer, 2022, pp. 125–142.
  8. Z. Feng, L. Yang, L. Jing, H. Wang, Y. Tian, and B. Li, “Disentangling object motion and occlusion for unsupervised multi-frame monocular depth,” in European Conference on Computer Vision.   Springer, 2022, pp. 228–244.
  9. Y. Wang, P. Wang, Z. Yang, C. Luo, Y. Yang, and W. Xu, “Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8071–8081.
  10. G. Xu, X. Wang, X. Ding, and X. Yang, “Iterative geometry encoding volume for stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 919–21 928.
  11. S. Gasperini, N. Morbitzer, H. Jung, N. Navab, and F. Tombari, “Robust monocular depth estimation under challenging conditions,” in Proceedings of the International Conference on Computer Vision (ICCV), 2023.
  12. R. Garg, V. K. B.G., G. Carneiro, and I. Reid, “Unsupervised cnn for single view depth estimation: Geometry to the rescue,” in Computer Vision – ECCV 2016.   Cham: Springer International Publishing, 2016, pp. 740–756.
  13. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  14. M. Oquab, T. Darcet, T. Moutakanni, H. V. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y. Huang, H. Xu, V. Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” 2023.
  15. I. Fang, H.-C. Wen, C.-L. Hsu, P.-C. Jen, P.-Y. Chen, Y.-S. Chen et al., “Es3net: Accurate and efficient edge-based self-supervised stereo matching network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4471–4480.
  16. K. Saunders, G. Vogiatzis, and L. Manso, “Self-supervised monocular depth estimation: Let’s talk about the weather,” in Proceedings of the International Conference on Computer Vision (ICCV), 2023.
  17. W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” The International Journal of Robotics Research (IJRR), 2017.
  18. M. B. Vankadari, S. Garg, A. Majumder, S. Kumar, and A. Behera, “Unsupervised monocular depth estimation for night-time images using adversarial domain feature adaptation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:222133092
  19. C. Zhao, Y. Tang, and Q. Sun, “Unsupervised monocular depth estimation in highly complex environments,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, pp. 1237–1246, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:236469578
  20. L. Liu, X. Song, M. Wang, Y. Liu, and L. Zhang, “Self-supervised monocular depth estimation for all day images using domain separation,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  21. J. Spencer, R. Bowden, and S. Hadfield, “Defeat-net: General monocular depth via simultaneous unsupervised representation learning,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).   Los Alamitos, CA, USA: IEEE Computer Society, Jun 2020, pp. 14 390–14 401. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/CVPR42600.2020.01441
  22. K. Wang, Z. Zhang, Z. Yan, X. Li, B. Xu, J. Li, and J. Yang, “Regularizing nighttime weirdness: Efficient self-supervised monocular depth estimation in the dark,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  23. M. Vankadari, S. Golodetz, S. Garg, S. Shin, A. Markham, and N. Trigoni, “When the sun goes down: Repairing photometric losses for all-day depth estimation,” 2022.
  24. Y. Zheng, C. Zhong, P. Li, H. ang Gao, Y. Zheng, B. Jin, L. Wang, H. Zhao, G. Zhou, Q. Zhang, and D. Zhao, “Steps: Joint self-supervised nighttime image enhancement and depth estimation,” in IEEE International Conference on Robotics and Automation (ICRA), 2023.
  25. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying flow, stereo and depth estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  26. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16.   Springer, 2020, pp. 402–419.
  27. J. Li, P. Wang, P. Xiong, T. Cai, Z. Yan, L. Yang, J. Liu, H. Fan, and S. Liu, “Practical stereo matching via cascaded recurrent network with adaptive correlation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 263–16 272.
  28. G. Xu, Y. Wang, J. Cheng, J. Tang, and X. Yang, “Accurate and efficient stereo matching via attention concatenation volume,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  29. A. Sharma, L.-F. Cheong, L. Heng, and R. T. Tan, “Nighttime stereo depth estimation using joint translation-stereo learning: Light effects and uninformative regions,” in 2020 International Conference on 3D Vision (3DV), 2020, pp. 23–31.
  30. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015.
  31. S. Amir, Y. Gandelsman, S. Bagon, and T. Dekel, “Deep vit features as dense visual descriptors,” arXiv preprint arXiv:2112.05814, vol. 2, no. 3, p. 4, 2021.
  32. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8922–8931.
  33. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4938–4947.
  34. D. Hernandez-Juarez, A. Chacón, A. Espinosa, D. Vázquez, J. C. Moure, and A. M. López, “Embedded real-time stereo estimation via semi-global matching on the gpu,” Procedia Computer Science, vol. 80, pp. 143–153, 2016.
  35. A. Sablayrolles, M. Douze, C. Schmid, and H. Jégou, “Spreading vectors for similarity search,” arXiv preprint arXiv:1806.03198, 2018.
  36. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
  37. U. Shin, J. Park, and I. S. Kweon, “Deep depth estimation from thermal image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1043–1053.
  38. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
  39. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in neural information processing systems, vol. 27, 2014.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com