Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation (2309.09272v2)
Abstract: With the frequent use of self-supervised monocular depth estimation in robotics and autonomous driving, the model's efficiency is becoming increasingly important. Most current approaches apply much larger and more complex networks to improve the precision of depth estimation. Some researchers incorporated Transformer into self-supervised monocular depth estimation to achieve better performance. However, this method leads to high parameters and high computation. We present a fully convolutional depth estimation network using contextual feature fusion. Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects instead of long-range fusion. We further promote depth estimation results employing lightweight channel attention based on convolution in the decoder stage. Our method reduces the parameters without sacrificing accuracy. Experiments on the KITTI benchmark show that our method can get better results than many large models, such as Monodepth2, with only 30 parameters. The source code is available at https://github.com/boyagesmile/DNA-Depth.
- “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
- “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759–8768.
- “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
- “Hr-depth: High resolution self-supervised monocular depth estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, pp. 2294–2301.
- “Diffnet: A learning to compare deep network for product recognition,” IEEE Access, vol. 8, pp. 19336–19344, 2020.
- “Demon: Depth and motion network for learning monocular stereo,” in Proceedings of the IEEE conference on computer vision and pattern recognition, July 2017, pp. 5038–5047.
- “Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018, pp. 573–590.
- “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, Eds. 2014, vol. 27, Curran Associates, Inc.
- “Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1043–1051.
- “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- “The temporal opportunist: Self-supervised multi-frame monocular depth,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 1164–1174.
- “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
- “Unsupervised learning of depth and ego-motion from video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. 2019, vol. 32, Curran Associates, Inc.
- “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.