Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior (2404.07176v1)
Abstract: Monocular depth estimation from a single image is an ill-posed problem for computer vision due to insufficient reliable cues as the prior knowledge. Besides the inter-frame supervision, namely stereo and adjacent frames, extensive prior information is available in the same frame. Reflections from specular surfaces, informative intra-frame priors, enable us to reformulate the ill-posed depth estimation task as a multi-view synthesis. This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors, known as reflection supervision and geometrical constraints. In the first stage, a water segmentation network is performed to separate the reflection components from the entire image. Next, we construct a self-supervised framework to predict the target appearance from reflections, perceived as other perspectives. The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation by aligning the transformed virtual depths and source ones. As a supplement, the water surface is determined from real and virtual camera positions, which complement the depth of the water area. Furthermore, to alleviate these laborious ground truth annotations, we introduce a large-scale water reflection scene (WRS) dataset rendered from Unreal Engine 4. Extensive experiments on the WRS dataset prove the feasibility of the proposed method compared to state-of-the-art depth estimation techniques.
- C. B. Hochberg, J. E. Hochberg, Familiar size and the perception of depth, The Journal of Psychology 34 (1952) 107–114.
- Depth from water reflection, IEEE Transactions on Image Processing 24 (2015) 1235–1243.
- U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 565–571.
- Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing 13 (2004) 600–612.
- R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
- Shape-from-shading: a survey, IEEE transactions on pattern analysis and machine intelligence 21 (1999) 690–706.
- Depth recovery and refinement from a single image using defocus cues, Journal of Modern Optics 62 (2015) 441–448.
- D. G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the seventh IEEE international conference on computer vision, volume 2, Ieee, 1999, pp. 1150–1157.
- Depth map prediction from a single image using a multi-scale deep network, in: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, 2014, pp. 2366–2374.
- K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014).
- A two-streamed network for estimating fine-scaled depth maps from single rgb images, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3372–3380.
- Deeper depth prediction with fully convolutional residual networks, in: 2016 Fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 239–248.
- Learning depth from single monocular images using deep convolutional neural fields, IEEE transactions on pattern analysis and machine intelligence 38 (2015) 2024–2039.
- Multi-scale continuous crfs as sequential deep networks for monocular depth estimation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5354–5362.
- From big to small: Multi-scale local planar guidance for monocular depth estimation, arXiv preprint arXiv:1907.10326 (2019).
- Deep ordinal regression network for monocular depth estimation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2002–2011.
- Adabins: Depth estimation using adaptive bins, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4009–4018.
- Monocular depth estimation using laplacian pyramid-based depth residuals, IEEE transactions on circuits and systems for video technology 31 (2021) 4381–4393.
- Z. Lu, Y. Chen, Pyramid frequency network with spatial attention residual refinement module for monocular depth estimation, Journal of Electronic Imaging 31 (2022) 023005.
- New crfs: Neural window fully-connected crfs for monocular depth estimation, arXiv preprint arXiv:2203.01502 (2022).
- Learning ordinal relationships for mid-level vision, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 388–396.
- Single-image depth perception in the wild, in: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 730–738.
- Generative adversarial nets, Advances in neural information processing systems 27 (2014).
- Depth prediction from a single image with conditional adversarial networks, in: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 1717–1721.
- Z. Lu, Y. Chen, Ga-cspn: generative adversarial monocular depth estimation with second-order convolutional spatial propagation network, Journal of Electronic Imaging 30 (2021) 043019.
- Unsupervised cnn for single view depth estimation: Geometry to the rescue, in: European conference on computer vision, Springer, 2016, pp. 740–756.
- Unsupervised monocular depth estimation with left-right consistency, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270–279.
- Digging into self-supervised monocular depth estimation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3828–3838.
- Learning depth from monocular videos using direct methods, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2022–2030.
- Adv-depth: self-supervised monocular depth estimation with an adversarial loss, IEEE Signal Processing Letters 28 (2021) 638–642.
- Appearance and shape from water reflection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 128–136.
- Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Unsupervised learning of stereo matching, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1567–1575.
- Depth transfer: Depth extraction from video using non-parametric sampling, IEEE transactions on pattern analysis and machine intelligence 36 (2014) 2144–2158.
- Imagenet large scale visual recognition challenge, International journal of computer vision 115 (2015) 211–252.
- Are we ready for autonomous driving? the kitti vision benchmark suite, in: 2012 IEEE conference on computer vision and pattern recognition, IEEE, 2012, pp. 3354–3361.
- Are we ready for unmanned surface vehicles in inland waterways? the usvinland multisensor dataset and benchmark, IEEE Robotics and Automation Letters 6 (2021) 3964–3970.
- Pixel difference convolutional network for rgb-d semantic segmentation, arXiv preprint arXiv:2302.11951 (2023).
- Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- Progressive hard-mining network for monocular depth estimation, IEEE Transactions on Image Processing 27 (2018) 3691–3702.