LightDepth: Single-View Depth Self-Supervision from Illumination Decline (2308.10525v2)
Abstract: Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction in comparison to supervised case. Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. In some medical devices, such as endoscopes, the camera and light sources are co-located at a small distance from the target surfaces. Thus, we can exploit that, for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance to the surface, providing a strong single-view self-supervisory signal. In our experiments, our self-supervised models deliver accuracies comparable to those of fully supervised ones, while being applicable without depth ground-truth data.
- Pablo Azagra et al. EndoMapper dataset of complete calibrated endoscopy procedures. arXiv:2204.14240, 2022.
- Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8):1670–1687, 2015.
- Photometric single-view dense 3D reconstruction in endoscopy. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4904–4910, 2022.
- Adabins: Depth estimation using adaptive bins. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4009–4018, 2021.
- Colonoscopy 3D video dataset with paired depth from 2D-3D registration. arXiv:2206.08903, 2022.
- Deep learning for robust normal estimation in unstructured point clouds. In Computer Graphics Forum, volume 35, pages 281–290. Wiley Online Library, 2016.
- SLAM endoscopy enhanced by adversarial depth prediction. KDD Workshop on Applied Data Science for Healthcare, 2019.
- Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. In IEEE/CVF International Conference on Computer Vision, pages 7063–7072, 2019.
- Depth estimation for colonoscopy images with self-supervised learning from videos. In Medical Image Computing and Computer Assisted Intervention–MICCAI, 2021.
- 3D reconstruction in laparoscopy with close-range photometric stereo. In Medical Image Computing and Computer-Assisted Intervention–MICCAI, pages 634–642. Springer, 2012.
- Towards live monocular 3D laparoscopy using shading and specularity information. In Int. Conf. Inf. Process. in Computer-Assisted Interventions, pages 11–21. Springer, 2012.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
- How do neural networks see depth in single images? In IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- Depth map prediction from a single image using a multi-scale deep network. In 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, page 2366–2374, Cambridge, MA, USA, 2014. MIT Press.
- CAM-Convs: Camera-aware multi-scale convolutions for single-view depth. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11826–11835, 2019.
- Three-filters-to-normal: An accurate and ultrafast surface normal estimator. IEEE Robotics and Automation Letters, 6(3):5405–5412, 2021.
- Detecting deficient coverage in colonoscopies. IEEE Transactions on Medical Imaging, 39(11):3451–3462, 2020.
- Deep ordinal regression network for monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2002–2011, 2018.
- Unsupervised monocular depth estimation with left-right consistency. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Digging into self-supervised monocular depth prediction. In IEEE/CVF International Conference on Computer Vision, October 2019.
- Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In IEEE/CVF International Conference on Computer Vision, pages 8977–8986, 2019.
- Photometric stereo-based depth map reconstruction for monocular capsule endoscopy. Sensors, 20(18):5403, 2020.
- Light source position calibration method for photometric stereo in capsule endoscopy. Advanced Robotics, 34(12):789–801, 2020.
- Berthold K.P. Horn and Michael J. Brooks, editors. Shape from Shading. MIT Press, 1989.
- Self-supervised generative adversarial network for depth estimation in laparoscopic images. In Medical Image Computing and Computer Assisted Intervention–MICCAI, 2021.
- Unsupervised monocular depth estimation for colonoscope system using feedback network. Sensors, 21(8), 2021.
- VR-Caps]: a virtual environment for capsule endoscopy. Medical image analysis, 70:101990, 2021.
- Sfm-ttr: Using structure from motion for test-time refinement of single-view depth networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4756–4765, 2020.
- Adversarial domain feature adaptation for bronchoscopic depth estimation. In Medical Image Computing and Computer Assisted Intervention–MICCAI, pages 300–310. Springer, 2021.
- Deeper depth prediction with fully convolutional residual networks. In Fourth IEEE International Conference on 3D Vision (3DV), pages 239–248, 2016.
- Unsupervised deep single-image intrinsic decomposition using illumination-varying image sequences. In Computer Graphics Forum, volume 37, pages 409–419, 2018.
- Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2475–2484, 2020.
- BinsFormer: Revisiting adaptive bins for monocular depth estimation. arXiv preprint arXiv:2204.00987, 2022.
- Shape and material capture at home. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6123–6133, 2021.
- Dense depth estimation in monocular endoscopy with self-supervised learning methods. IEEE Transactions on Medical Imaging, 39(5):1438–1447, 2020.
- Details preserved unsupervised depth estimation by fusing traditional stereo knowledge from laparoscopic images. Healthcare Technology Letters, 6(6):154, 2019.
- Consistent video depth estimation. ACM Transactions on Graphics (ToG), 39(4):71–1, 2020.
- RNNSLAM: Reconstructing the 3D colon to visualize missing regions during a colonoscopy. Medical image analysis, 72:102100, 2021.
- Unsupervised reverse domain adaptation for synthetic medical images via adversarial training. IEEE Transactions on Medical Imaging, 37(12):2572–2581, 2018.
- Deep learning and conditional random fields-based depth estimation and topographical reconstruction from conventional endoscopy. Medical image analysis, 48:230–243, 2018.
- Augmenting colonoscopy using extended and directional CycleGAN for lossy image translation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4696–4705, June 2020.
- Light modelling and calibration in laparoscopy. Int. J. Computer Assisted Radiology and Surgery, 15(5):859–866, 2020.
- EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Medical Image Analysis, 71:102058, 2021.
- Photometric stereo endoscopy. Journal of Biomedical Optics, 18(7):076017, 2013.
- On the uncertainty of self-supervised monocular depth estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3227–3237, 2020.
- Shape from shading: a well-posed problem? In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 870–877, 2005.
- Vision transformers for dense prediction. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 12179–12188, October 2021.
- Bimodal camera pose prediction for endoscopy. arXiv preprint arXiv:2204.04968, 2022.
- Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. International Journal of Computer Assisted Radiology and Surgery, pages 1–10, 2019.
- Endo-depth-and-motion: Reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robotics and Automation Letters, 6(4), 2021.
- Bayesian deep neural networks for supervised learning of single-view depth. IEEE Robotics and Automation Letters, 7(2):2565–2572, 2022.
- On the uncertain single-view depths in colonoscopies. In Medical Image Computing and Computer Assisted Intervention – MICCAI, pages 130–140, Cham, 2022. Springer Nature Switzerland.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Single-shot neural relighting and svbrdf estimation. In European Conference Computer Vision (ECCV), pages 85–101. Springer, 2020.
- A toolbox for easily calibrating omnidirectional cameras. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5695–5701, 2006.
- Context-aware depth and pose estimation for bronchoscopic navigation. IEEE Robotics and Automation Letters, 4(2), 2019.
- Feature-metric loss for self-supervised learning of depth and egomotion. In European Conference Computer Vision (ECCV), pages 572–588. Springer, 2020.
- Pseudo RGB-D for self-improving monocular SLAM and depth prediction. In European Conference Computer Vision (ECCV), pages 437–455. Springer, 2020.
- Metric depth recovery from monocular images using shape-from-shading and specularities. In IEEE International Conference on Image Processing, pages 25–28, 2012.
- Deep monocular 3D reconstruction for assisted navigation in bronchoscopy. International Journal of Computer Assisted Radiology and Surgery, 12(7):1089–1099, Jul 2017.
- The temporal opportunist: Self-supervised multi-frame monocular depth. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1164–1174, 2021.
- Unsupervised binocular depth prediction network for laparoscopic surgery. Computer Assisted Surgery, 24(sup1):30–35, 2019.
- Channel-wise attention-based network for self-supervised monocular depth estimation. In IEEE International Conference on 3D vision (3DV), pages 464–473, 2021.
- Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press, 2018.
- Shape-from-shading: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):690–706, 1999.
- A template-based 3D reconstruction of colon structures and textures from stereo colonoscopic images. IEEE Transactions on Medical Robotics and Bionics, 3(1):85–95, 2020.
- Modeling indirect illumination for inverse rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18643–18652, 2022.
- DeepTAM: Deep tracking and mapping. In European Conference Computer Vision (ECCV), pages 851–868. Springer, 2018.
- Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.
- Unsupervised learning of depth and ego-motion from video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1851–1858, 2017.