Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion (2306.03584v2)

Published 6 Jun 2023 in cs.CV and cs.AI

Abstract: Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem. Numerous methods excel at reconstructing dense depth maps from sparse samples, but they often falter when faced with extensive contiguous regions of missing depth values, a prevalent and critical challenge in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps while ensuring high fidelity through cycle consistency. We fuse the two branches via adaptive fusion modules named W-AdaIN and train the model with the help of pseudo depth maps. Comprehensive evaluations on NYU-Depth V2 and SUN RGB-D datasets show that our method significantly enhances depth completion performance particularly in realistic indoor settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. H. Wang, M. Wang, Z. Che, Z. Xu, X. Qiao, M. Qi, F. Feng, and J. Tang, “Rgb-depth fusion gan for indoor depth completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6209–6218.
  2. S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene understanding benchmark suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 567–576.
  3. Y. Fu, Q. Yan, L. Yang, J. Liao, and C. Xiao, “Texture mapping for 3d reconstruction with rgb-d sensor,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  4. B. Li, J. P. Munoz, X. Rong, Q. Chen, J. Xiao, Y. Tian, A. Arditi, and M. Yousuf, “Vision-based mobile indoor assistive navigation aid for blind people,” IEEE Transactions on Mobile Computing (TMC), vol. 18, no. 3, pp. 702–714, 2019.
  5. Y. Zhao and T. Guo, “Pointar: Efficient lighting estimation for mobile augmented reality,” in European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 678–693.
  6. Microsoft, “Kinect for windows.” [Online]. Available: {https://developer.microsoft.com/en-us/windows/kinect/}
  7. L. Keselman, J. Iselin Woodfill, A. Grunnet-Jepsen, and A. Bhowmik, “Intel realsense stereoscopic depth cameras,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
  8. ASUS, “Asus xtion.” [Online]. Available: {www.asus.com/Multimedia/Xtion\_PRO/}
  9. F. Ma and S. Karaman, “Sparse-to-dense: Depth prediction from sparse depth samples and a single image,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 4796–4803.
  10. X. Cheng, P. Wang, and R. Yang, “Depth estimation via affinity learned with convolutional spatial propagation network,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 103–119.
  11. J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3313–3322.
  12. Y.-K. Huang, T.-H. Wu, Y.-C. Liu, and W. H. Hsu, “Indoor depth completion with boundary consistency and self-attention,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct 2019.
  13. B.-U. Lee, H.-G. Jeon, S. Im, and I. S. Kweon, “Depth completion with deep geometry and context guidance,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 3281–3287.
  14. J. Park, K. Joo, Z. Hu, C.-K. Liu, and I. So Kweon, “Non-local spatial propagation network for depth completion,” in European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 120–136.
  15. A. Saxena, S. H. Chung, and A. Y. Ng, “Learning depth from single monocular images,” in NIPS, 2005.
  16. M. Liu, M. Salzmann, and X. He, “Discrete-continuous depth estimation from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 716–723.
  17. Q. Yang, “Stereo matching using tree filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 37, no. 4, pp. 834–846, 2014.
  18. S. Liu, S. De Mello, J. Gu, G. Zhong, M.-H. Yang, and J. Kautz, “Learning affinity via spatial propagation networks,” in NIPS, 2017.
  19. B.-U. Lee, K. Lee, and I. S. Kweon, “Depth completion using plane-residual representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 13 916–13 925.
  20. Y. Zhong, C.-Y. Wu, S. You, and U. Neumann, “Deep rgb-d canonical correlation analysis for sparse depth completion,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  21. Y. Ding, P. Li, D. Huang, and Z. Li, “Rethinking feature context in learning image-guided depth completion,” in International Conference on Artificial Neural Networks.   Springer, 2023, pp. 99–110.
  22. Y. Lin, T. Cheng, Q. Zhong, W. Zhou, and H. Yang, “Dynamic spatial propagation network for depth completion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1638–1646.
  23. J. Coughlan and A. L. Yuille, “The manhattan world assumption: Regularities in scene statistics which enable bayesian inference,” Advances in Neural Information Processing Systems, vol. 13, 2000.
  24. R. Yunus, Y. Li, and F. Tombari, “Manhattanslam: Robust planar tracking and mapping leveraging mixture of manhattan frames,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 6687–6693.
  25. B. Li, Y. Huang, Z. Liu, D. Zou, and W. Yu, “Structdepth: Leveraging the structural regularities for self-supervised indoor depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 12 663–12 673.
  26. H. Guo, S. Peng, H. Lin, Q. Wang, G. Zhang, H. Bao, and X. Zhou, “Neural 3d scene reconstruction with the manhattan-world assumption,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5511–5520.
  27. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
  28. T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” in International conference on machine learning.   PMLR, 2017, pp. 1857–1865.
  29. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  30. J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: A generative adversarial network for infrared and visible image fusion,” Information Fusion, vol. 48, pp. 11–26, 2019.
  31. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  32. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networkss,” in Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
  33. C. Zhang, Y. Tang, C. Zhao, Q. Sun, Z. Ye, and J. Kurths, “Multitask gans for semantic segmentation and depth completion with cycle consistency,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5404–5415, 2021.
  34. Z. Yan, K. Wang, X. Li, Z. Zhang, J. Li, and J. Yang, “Rignet: Repetitive image guided network for depth completion,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII.   Springer, 2022, pp. 214–230.
  35. S. Imran, Y. Long, X. Liu, and D. Morris, “Depth coefficients for depth completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  36. H. Chen, H. Yang, Y. Zhang et al., “Depth completion using geometry-aware embedding,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 8680–8686.
  37. W. Van Gansbeke, D. Neven, B. De Brabandere, and L. Van Gool, “Sparse and noisy lidar completion with rgb guidance and uncertainty,” in 2019 16th international conference on machine vision applications (MVA).   IEEE, 2019, pp. 1–6.
  38. A. Li, Z. Yuan, Y. Ling, W. Chi, s. zhang, and C. Zhang, “A multi-scale guided cascade hourglass network for depth completion,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), March 2020.
  39. D. Senushkin, M. Romanov, I. Belikov, N. Patakin, and A. Konushin, “Decoder modulation for indoor depth completion,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021.   IEEE, 2021, pp. 2181–2188.
  40. X. Liu, X. Shao, B. Wang, Y. Li, and S. Wang, “Graphcspn: Geometry-aware depth completion via dynamic gcns,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII.   Springer, 2022, pp. 90–107.
  41. X. Chen, K.-Y. Lin, J. Wang, W. Wu, C. Qian, H. Li, and G. Zeng, “Bi-directional cross-modality feature propagation with separation-and-aggregation gate for rgb-d semantic segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI.   Springer, 2020, pp. 561–577.
  42. J. Cao, H. Leng, D. Lischinski, D. Cohen-Or, C. Tu, and Y. Li, “Shapeconv: Shape-aware convolutional layer for indoor rgb-d semantic segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 7088–7097.
  43. H. Zhou, L. Qi, H. Huang, X. Yang, Z. Wan, and X. Wen, “Canet: Co-attention network for rgb-d semantic segmentation,” Pattern Recognition, vol. 124, p. 108468, 2022.
  44. A. Bozic, M. Zollhofer, C. Theobalt, and M. Nießner, “Deepdeform: Learning non-rigid rgb-d reconstruction with semi-supervised data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7002–7012.
  45. S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, “Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7515–7525.
  46. D. Azinović, R. Martin-Brualla, D. B. Goldman, M. Nießner, and J. Thies, “Neural rgb-d surface reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6290–6301.
  47. P. Karkus, S. Cai, and D. Hsu, “Differentiable slam-net: Learning particle slam for visual navigation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2815–2825.
  48. Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in neural information processing systems, vol. 34, pp. 16 558–16 569, 2021.
  49. “Visual slam for robot navigation in healthcare facility,” Pattern Recognition, vol. 113, p. 107822, 2021.
  50. M. Maire, T. Narihira, and S. X. Yu, “Affinity cnn: Learning pixel-centric pairwise relations for figure/ground embedding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 174–182.
  51. Y. Cheng, R. Cai, Z. Li, X. Zhao, and K. Huang, “Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3029–3037.
  52. S.-J. Park, K.-S. Hong, and S. Lee, “Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4980–4989.
  53. D. Du, L. Wang, H. Wang, K. Zhao, and G. Wu, “Translate-to-recognize networks for rgb-d scene recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  54. C.-T. Lin, S.-W. Huang, Y.-Y. Wu, and S.-H. Lai, “Gan-based day-to-night image style transfer for nighttime vehicle detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 951–963, 2020.
  55. H. Emami, M. M. Aliabadi, M. Dong, and R. B. Chinnam, “Spa-gan: Spatial attention gan for image-to-image translation,” IEEE Transactions on Multimedia, vol. 23, pp. 391–401, 2020.
  56. R. Li, “Image style transfer with generative adversarial networks,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2950–2954.
  57. H. Liu, Z. Wan, W. Huang, Y. Song, X. Han, and J. Liao, “Pd-gan: Probabilistic diverse gan for image inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9371–9381.
  58. M. Afifi, M. A. Brubaker, and M. S. Brown, “Histogan: Controlling colors of gan-generated and real images via color histograms,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 7941–7950.
  59. F. Zhan, H. Zhu, and S. Lu, “Spatial fusion gan for image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3653–3662.
  60. E. R. Chan, M. Monteiro, P. Kellnhofer, J. Wu, and G. Wetzstein, “pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5799–5809.
  61. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
  62. C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet: Reconstructing the 3d room layout from a single rgb image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2051–2059.
  63. C. Lin, C. Li, and W. Wang, “Floorplan-jigsaw: Jointly estimating scene layout and aligning partial scans,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5674–5683.
  64. S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3363–3372.
  65. G. Pintore, M. Agus, and E. Gobbetti, “Atlantanet: inferring the 3d indoor layout from a single 360°image beyond the manhattan world assumption,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII.   Springer, 2020, pp. 432–448.
  66. C. Zou, J.-W. Su, C.-H. Peng, A. Colburn, Q. Shan, P. Wonka, H.-K. Chu, and D. Hoiem, “Manhattan room layout reconstruction from a single 360°image: A comparative study of state-of-the-art methods,” International Journal of Computer Vision, vol. 129, pp. 1410–1431, 2021.
  67. Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3d context model for panoramic scene understanding,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13.   Springer, 2014, pp. 668–686.
  68. Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Manhattan-world stereo,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2009, pp. 1422–1429.
  69. M. Li, P. Wonka, and L. Nan, “Manhattan-world urban reconstruction from point clouds,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14.   Springer, 2016, pp. 54–69.
  70. V. Patil, C. Sakaridis, A. Liniger, and L. Van Gool, “P3depth: Monocular depth estimation with a piecewise planarity prior,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1610–1621.
  71. J. Xu, X. Liu, Y. Bai, J. Jiang, K. Wang, X. Chen, and X. Ji, “Multi-camera collaborative depth prediction via consistent structure estimation,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 2730–2738.
  72. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890.
  73. G. Bae, I. Budvytis, and R. Cipolla, “Estimating and exploiting the aleatoric uncertainty in surface normal estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 137–13 146.
  74. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  75. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   Ieee, 2009, pp. 248–255.
  76. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30.   Curran Associates, Inc., 2017.
  77. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  78. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017, pp. 5998–6008.
  79. S.-Y. Kim, M. Kim, and Y.-S. Ho, “Depth image filter for mixed and noisy pixel removal in rgb-d camera systems,” IEEE Transactions on Consumer Electronics, vol. 59, no. 3, pp. 681–689, 2013.
  80. M. Arnold, A. Ghosh, S. Ameling, and G. Lacey, “Automatic segmentation and inpainting of specular highlights for endoscopic imaging,” EURASIP Journal on Image and Video Processing, vol. 2010, pp. 1–12, 2010.
  81. E.-T. Baek, H.-J. Yang, S.-H. Kim, G. Lee, and H. Jeong, “Distance error correction in time-of-flight cameras using asynchronous integration time,” Sensors, vol. 20, no. 4, p. 1156, 2020.
  82. P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International journal of computer vision (IJCV), vol. 59, no. 2, pp. 167–181, 2004.
  83. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  84. P. K. Nathan Silberman, Derek Hoiem and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conference on Computer Vision (ECCV), 2012.
  85. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
  86. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2015.
  87. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6924–6932.
  88. C. R. Qi, O. Litany, K. He, and L. J. Guibas, “Deep hough voting for 3d object detection in point clouds,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019, pp. 9277–9286.
  89. Z. Zhang, B. Sun, H. Yang, and Q. Huang, “H3dnet: 3d object detection using hybrid geometric primitives,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 311–329.

Summary

We haven't generated a summary for this paper yet.