Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

360$^\circ$ High-Resolution Depth Estimation via Uncertainty-aware Structural Knowledge Transfer (2304.07967v3)

Published 17 Apr 2023 in cs.CV

Abstract: To predict high-resolution (HR) omnidirectional depth map, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully-supervised learning. However, in practice, taking HR ODI as input is undesired due to resource-constrained devices. In addition, depth maps are often with lower resolution than color images. Therefore, in this paper, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth GT map is available. Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task extracts the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge. Extensive experiments demonstrate that our weakly-supervised method outperforms baseline methods, and even achieves comparable performance with the fully-supervised methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. E. Galván and P. Mooney, “Neuroevolution in deep neural networks: Current trends and future challenges,” IEEE Transactions on Artificial Intelligence, vol. 2, no. 6, pp. 476–493, 2021.
  2. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in NIPS, 2014.
  3. L. Huynh, P. Nguyen-Ha, J. Matas, E. Rahtu, and J. Heikkila, “Guiding monocular depth estimation using depth-attention volume,” ArXiv, vol. abs/2004.02760, 2020.
  4. J. Hu, M. Ozay, Y. Zhang, and T. Okatani, “Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries,” 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1043–1051, 2019.
  5. J. H. Lee, M.-K. Han, D. W. Ko, and I. H. Suh, “From big to small: Multi-scale local planar guidance for monocular depth estimation,” ArXiv, vol. abs/1907.10326, 2019.
  6. I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” ArXiv, vol. abs/1812.11941, 2018.
  7. M. Mancini, G. Costante, P. Valigi, and T. A. Ciarfuglia, “Fast robust monocular depth estimation for obstacle detection with fully convolutional networks,” 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4296–4303, 2016.
  8. H. Ai, Z. Cao, J. Zhu, H. Bai, Y. Chen, and L. Wang, “Deep learning for omnidirectional vision: A survey and new perspectives,” arXiv preprint arXiv:2205.10468, 2022.
  9. C. Sun, M. Sun, and H.-T. Chen, “Hohonet: 360 indoor holistic understanding with latent horizontal features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2573–2582.
  10. H. Jiang, Z. Sheng, S. Zhu, Z. Dong, and R. Huang, “Unifuse: Unidirectional fusion for 360° panorama depth estimation,” IEEE Robotics and Automation Letters, vol. 6, pp. 1519–1526, 2021.
  11. H. Ai, Z. Cao, Y. pei Cao, Y. Shan, and L. Wang, “Hrdfuse: Monocular 360°depth estimation by collaboratively learning holistic-with-regional depth distributions,” ArXiv, vol. abs/2303.11616, 2023.
  12. F.-E. Wang, Y. hsuan Yeh, M. Sun, W.-C. Chiu, and Y.-H. Tsai, “Bifuse: Monocular 360 depth estimation via bi-projection fusion,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 459–468, 2020.
  13. Y. Li, Y. Guo, Z. Yan, X. Huang, Y. Duan, and L. Ren, “Omnifusion: 360 monocular depth estimation via geometry-aware fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2801–2810.
  14. N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, “Omnidepth: Dense depth estimation for indoors spherical panoramas,” in ECCV, 2018.
  15. M. Rey-Area, M. Yuan, and C. Richardt, “360monodepth: High-resolution 360° monocular depth estimation,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3752–3762, 2022.
  16. K. Tateno, N. Navab, and F. Tombari, “Distortion-aware convolutional filters for dense prediction in panoramic images,” in ECCV, 2018.
  17. C. Zhuang, Z. Lu, Y. Wang, J. Xiao, and Y. Wang, “Acdnet: Adaptively combined dilated convolution for monocular panorama depth estimation,” arXiv preprint arXiv:2112.14440, 2021.
  18. C.-H. Peng and J. Zhang, “High-resolution depth estimation for 360deg panoramas through perspective and panoramic depth images registration,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 3116–3125.
  19. S. Mangiante, G. Klas, A. Navon, Z. GuanHua, J. Ran, and M. D. Silva, “Vr is on the edge: How to deliver 360 videos in mobile networks,” in Proceedings of the Workshop on Virtual Reality and Augmented Reality Network, 2017, pp. 30–35.
  20. Z. Luo, B. Chai, Z. Wang, M. Hu, and D. Wu, “Masked360: Enabling robust 360-degree video streaming with ultra low bandwidth consumption,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 5, pp. 2690–2699, 2023.
  21. B. Sun, X. Ye, B. Li, H. Li, Z. Wang, and R. Xu, “Learning scene structure guidance via cross-task knowledge transfer for single depth super-resolution,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7788–7797, 2021.
  22. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 3354–3361.
  23. Y. Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,” IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, 2020.
  24. X. Deng, H. Wang, M. Xu, Y. Guo, Y. Song, and L. Yang, “Lau-net: Latitude adaptive upscaling network for omnidirectional image super-resolution,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9185–9194, 2021.
  25. Y.-S. Yoon, I. Chung, L. Wang, and K.-J. Yoon, “Spheresr: 360° image super-resolution with arbitrary projection via continuous spherical image representation,” ArXiv, vol. abs/2112.06536, 2021.
  26. X. Liu, D. Zhai, R. Chen, X. Ji, D. Zhao, and W. Gao, “Depth super-resolution via joint color-guided internal and external regularizations,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1636–1645, 2018.
  27. K. Karsch, C. Liu, and S. B. Kang, “Depth transfer: Depth extraction from video using non-parametric sampling,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 11, pp. 2144–2158, 2014.
  28. Q. Ning, W. Dong, X. Li, J. Wu, and G. Shi, “Uncertainty-driven loss for single image super-resolution,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  29. I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” ArXiv, vol. abs/1702.01105, 2017.
  30. A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3d: Learning from rgb-d data in indoor environments,” arXiv preprint arXiv:1709.06158, 2017.
  31. Z. Shen, C. Lin, K. Liao, L. Nie, Z. Zheng, and Y. Zhao, “Panoformer: Panorama transformer for indoor 360 depth estimation,” arXiv e-prints, pp. arXiv–2203, 2022.
  32. M. Li, S. Wang, W. Yuan, W. Shen, Z. Sheng, and Z. Dong, “S2net: Accurate panorama depth estimation on spherical surface,” IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 1053–1060, 2023.
  33. Z. Arican and P. Frossard, “Joint registration and super-resolution with omnidirectional images,” IEEE Transactions on Image Processing, vol. 20, pp. 3151–3162, 2011.
  34. L. Bagnato, Y. Boursier, P. Frossard, and P. Vandergheynst, “Plenoptic based super-resolution for omnidirectional image sequences,” 2010 IEEE International Conference on Image Processing, pp. 2829–2832, 2010.
  35. H. Nagahara, Y. Yagi, and M. Yachida, “Super-resolution from an omnidirectional image sequence,” 2000 26th Annual Conference of the IEEE Industrial Electronics Society. IECON 2000. 2000 IEEE International Conference on Industrial Electronics, Control and Instrumentation. 21st Century Technologies, vol. 4, pp. 2559–2564 vol.4, 2000.
  36. C. Ozcinar, A. Rana, and A. Smolic, “Super-resolution of omnidirectional images using adversarial learning,” 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6, 2019.
  37. F. Yu, X. Wang, M. Cao, G. Li, Y. Shan, and C. Dong, “Osrt: Omnidirectional image super-resolution with distortion-aware transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 283–13 292.
  38. X. Sun, W. Li, Z. Zhang, Q. Ma, X. Sheng, M. Cheng, H. Ma, S. Zhao, J. Zhang, J. Li et al., “Opdn: Omnidirectional position-aware deformable network for omnidirectional image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1293–1301.
  39. K.-S. Chung and C. Lee, “Gram: Gradient rescaling attention model for data uncertainty estimation in single image super resolution,” 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 8–13, 2019.
  40. M. A. T. Figueiredo, “Adaptive sparseness using jeffreys prior,” in NIPS, 2001.
  41. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4460–4470.
  42. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  43. S. Saito, T. Simon, J. Saragih, and H. Joo, “Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 84–93.
  44. Y. Chen, S. Liu, and X. Wang, “Learning continuous image representation with local implicit image function,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8628–8638.
  45. J. Tang, X. Chen, and G. Zeng, “Joint implicit image function for guided depth super-resolution,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4390–4399.
  46. S. Niu, Y. Liu, J. Wang, and H. Song, “A decade survey of transfer learning (2010–2020),” IEEE Transactions on Artificial Intelligence, vol. 1, no. 2, pp. 151–166, 2020.
  47. A. Mousavian, H. Pirsiavash, and J. Kosecka, “Joint semantic segmentation and depth estimation with deep convolutional networks,” 2016 Fourth International Conference on 3D Vision (3DV), pp. 611–619, 2016.
  48. Z. Liu, X. Qi, and C.-W. Fu, “3d-to-2d distillation for indoor scene parsing,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4462–4472, 2021.
  49. L. xilinx Wang, D. Li, Y. Zhu, L. Tian, and Y. Shan, “Dual super-resolution learning for semantic segmentation,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3773–3782, 2020.
  50. Y. Yang, Q. Cao, J. Zhang, and D. Tao, “Codon: On orchestrating cross-domain attentions for depth super-resolution,” International Journal of Computer Vision, pp. 1–18, 2022.
  51. D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” arXiv: Learning, 2015.
  52. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
  53. I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in 2016 Fourth international conference on 3D vision (3DV).   IEEE, 2016, pp. 239–248.
  54. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
  55. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  56. K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 6, pp. 1397–1409, 2012.
  57. J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1833–1844.
  58. H. Wang, X. Chen, B. Ni, Y. Liu, and J. Liu, “Omni aggregation networks for lightweight image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 378–22 387.

Summary

We haven't generated a summary for this paper yet.