Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior (2404.07176v1)

Published 10 Apr 2024 in cs.CV

Abstract: Monocular depth estimation from a single image is an ill-posed problem for computer vision due to insufficient reliable cues as the prior knowledge. Besides the inter-frame supervision, namely stereo and adjacent frames, extensive prior information is available in the same frame. Reflections from specular surfaces, informative intra-frame priors, enable us to reformulate the ill-posed depth estimation task as a multi-view synthesis. This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors, known as reflection supervision and geometrical constraints. In the first stage, a water segmentation network is performed to separate the reflection components from the entire image. Next, we construct a self-supervised framework to predict the target appearance from reflections, perceived as other perspectives. The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation by aligning the transformed virtual depths and source ones. As a supplement, the water surface is determined from real and virtual camera positions, which complement the depth of the water area. Furthermore, to alleviate these laborious ground truth annotations, we introduce a large-scale water reflection scene (WRS) dataset rendered from Unreal Engine 4. Extensive experiments on the WRS dataset prove the feasibility of the proposed method compared to state-of-the-art depth estimation techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. C. B. Hochberg, J. E. Hochberg, Familiar size and the perception of depth, The Journal of Psychology 34 (1952) 107–114.
  2. Depth from water reflection, IEEE Transactions on Image Processing 24 (2015) 1235–1243.
  3. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
  4. V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 565–571.
  5. Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing 13 (2004) 600–612.
  6. R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
  7. Shape-from-shading: a survey, IEEE transactions on pattern analysis and machine intelligence 21 (1999) 690–706.
  8. Depth recovery and refinement from a single image using defocus cues, Journal of Modern Optics 62 (2015) 441–448.
  9. D. G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the seventh IEEE international conference on computer vision, volume 2, Ieee, 1999, pp. 1150–1157.
  10. Depth map prediction from a single image using a multi-scale deep network, in: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, 2014, pp. 2366–2374.
  11. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014).
  12. A two-streamed network for estimating fine-scaled depth maps from single rgb images, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3372–3380.
  13. Deeper depth prediction with fully convolutional residual networks, in: 2016 Fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 239–248.
  14. Learning depth from single monocular images using deep convolutional neural fields, IEEE transactions on pattern analysis and machine intelligence 38 (2015) 2024–2039.
  15. Multi-scale continuous crfs as sequential deep networks for monocular depth estimation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5354–5362.
  16. From big to small: Multi-scale local planar guidance for monocular depth estimation, arXiv preprint arXiv:1907.10326 (2019).
  17. Deep ordinal regression network for monocular depth estimation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2002–2011.
  18. Adabins: Depth estimation using adaptive bins, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4009–4018.
  19. Monocular depth estimation using laplacian pyramid-based depth residuals, IEEE transactions on circuits and systems for video technology 31 (2021) 4381–4393.
  20. Z. Lu, Y. Chen, Pyramid frequency network with spatial attention residual refinement module for monocular depth estimation, Journal of Electronic Imaging 31 (2022) 023005.
  21. New crfs: Neural window fully-connected crfs for monocular depth estimation, arXiv preprint arXiv:2203.01502 (2022).
  22. Learning ordinal relationships for mid-level vision, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 388–396.
  23. Single-image depth perception in the wild, in: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 730–738.
  24. Generative adversarial nets, Advances in neural information processing systems 27 (2014).
  25. Depth prediction from a single image with conditional adversarial networks, in: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 1717–1721.
  26. Z. Lu, Y. Chen, Ga-cspn: generative adversarial monocular depth estimation with second-order convolutional spatial propagation network, Journal of Electronic Imaging 30 (2021) 043019.
  27. Unsupervised cnn for single view depth estimation: Geometry to the rescue, in: European conference on computer vision, Springer, 2016, pp. 740–756.
  28. Unsupervised monocular depth estimation with left-right consistency, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270–279.
  29. Digging into self-supervised monocular depth estimation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3828–3838.
  30. Learning depth from monocular videos using direct methods, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2022–2030.
  31. Adv-depth: self-supervised monocular depth estimation with an adversarial loss, IEEE Signal Processing Letters 28 (2021) 638–642.
  32. Appearance and shape from water reflection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 128–136.
  33. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  34. Unsupervised learning of stereo matching, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1567–1575.
  35. Depth transfer: Depth extraction from video using non-parametric sampling, IEEE transactions on pattern analysis and machine intelligence 36 (2014) 2144–2158.
  36. Imagenet large scale visual recognition challenge, International journal of computer vision 115 (2015) 211–252.
  37. Are we ready for autonomous driving? the kitti vision benchmark suite, in: 2012 IEEE conference on computer vision and pattern recognition, IEEE, 2012, pp. 3354–3361.
  38. Are we ready for unmanned surface vehicles in inland waterways? the usvinland multisensor dataset and benchmark, IEEE Robotics and Automation Letters 6 (2021) 3964–3970.
  39. Pixel difference convolutional network for rgb-d semantic segmentation, arXiv preprint arXiv:2302.11951 (2023).
  40. Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
  41. Progressive hard-mining network for monocular depth estimation, IEEE Transactions on Image Processing 27 (2018) 3691–3702.
Citations (4)

Summary

We haven't generated a summary for this paper yet.