Depth Estimation using Weighted-loss and Transfer Learning (2404.07686v1)
Abstract: Depth estimation from 2D images is a common computer vision task that has applications in many fields including autonomous vehicles, scene understanding and robotics. The accuracy of a supervised depth estimation method mainly relies on the chosen loss function, the model architecture, quality of data and performance metrics. In this study, we propose a simplified and adaptable approach to improve depth estimation accuracy using transfer learning and an optimized loss function. The optimized loss function is a combination of weighted losses to which enhance robustness and generalization: Mean Absolute Error (MAE), Edge Loss and Structural Similarity Index (SSIM). We use a grid search and a random search method to find optimized weights for the losses, which leads to an improved model. We explore multiple encoder-decoder-based models including DenseNet121, DenseNet169, DenseNet201, and EfficientNet for the supervised depth estimation model on NYU Depth Dataset v2. We observe that the EfficientNet model, pre-trained on ImageNet for classification when used as an encoder, with a simple upsampling decoder, gives the best results in terms of RSME, REL and log10: 0.386, 0.113 and 0.049, respectively. We also perform a qualitative analysis which illustrates that our model produces depth maps that closely resemble ground truth, even in cases where the ground truth is flawed. The results indicate significant improvements in accuracy and robustness, with EfficientNet being the most successful architecture.
- High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941.
- The simultaneous localization and mapping (slam)-an overview. Surv. Geospat. Eng. J, 2:34–45.
- Structural similarity index (ssim) revisited: A data-driven approach. Expert Systems with Applications, 189:116087.
- Cross x-ai: Explainable semantic segmentation of laparoscopic images in relation to depth estimation. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
- On regression losses for deep depth estimation. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2915–2919. IEEE.
- Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature. Geoscientific model development, 7(3):1247–1250.
- A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34(15):1995–2006.
- Mobilexnet: An efficient convolutional neural network for monocular depth estimation. IEEE Transactions on Intelligent Transportation Systems, 23(11):20134–20147.
- Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27.
- Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2002–2011.
- Depth estimation using structured light flow–analysis of projected pattern flow on an object’s surface. In Proceedings of the IEEE International conference on computer vision, pages 4640–4648.
- Unsupervised cnn for single view depth estimation: Geometry to the rescue. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 740–756. Springer.
- Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 270–279.
- Detail preserving depth estimation from a single image using attention guided networks. In 2018 International Conference on 3D Vision (3DV), pages 304–313. IEEE.
- Evaluation of stereo matching costs on images with radiometric differences. IEEE transactions on pattern analysis and machine intelligence, 31(9):1582–1599.
- Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth international conference on 3D vision (3DV), pages 239–248. IEEE.
- Monocular depth estimation with hierarchical fusion of dilated cnns and soft-weighted-sum inference. Pattern Recognition, 83:328–339.
- Monocular depth estimation using deep learning: A review. Sensors, 22(14):5353.
- A comparative study of deep learning-based depth estimation approaches: Application to smart mobility. In 2021 8th International Conference on Smart Computing and Communications (ICSCC), pages 80–84. IEEE.
- Atlas: End-to-end 3d scene reconstruction from posed images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 414–431. Springer.
- P3depth: Monocular depth estimation with a piecewise planarity prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1610–1621.
- Edge loss functions for deep-learning depth-map. Machine Learning with Applications, 7:100218.
- A survey on sensor-based threats and attacks to smart devices and applications. IEEE Communications Surveys & Tutorials, 23(2):1125–1159.
- Indoor segmentation and support inference from rgbd images. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pages 746–760. Springer.
- A neural network for detailed human depth estimation from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7750–7759.
- Depth estimation from image structure. IEEE Transactions on pattern analysis and machine intelligence, 24(9):1226–1238.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612.
- Depth-only object tracking. In British Machine Vision Conference.
- Mixed-scale unet based on dense atrous pyramid for monocular depth estimation. IEEE Access, 9:114070–114084.
- Edge enhancement in monocular depth prediction. In 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), pages 1594–1599. IEEE.
- Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, 63(9):1612–1627.