Papers
Topics
Authors
Recent
Search
2000 character limit reached

Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

Published 2 May 2024 in cs.CV, cs.AI, and eess.IV | (2405.01113v1)

Abstract: A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data generation in simulation using 3D synthetic environments and CycleGAN domain transfer. We compare this method of data generation to the popular NYUDepth V2 dataset by training a depth estimation model based on the DenseDepth structure using different training sets of real and simulated data. We evaluate the performance of the models on newly collected images and LiDAR depth data from a Husky robot to verify the generalizability of the approach and show that GAN-transformed data can serve as an effective alternative to real-world data, particularly in depth estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. G. Prabhakar, B. Kailath, S. Natarajan, and R. Kumar, “Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving,” in 2017 IEEE region 10 symposium (TENSYMP).   IEEE, 2017, pp. 1–6.
  2. A. Khosravian, A. Amirkhani, H. Kashiani, and M. Masih-Tehrani, “Generalizing state-of-the-art object detectors for autonomous vehicles in unseen environments,” Expert Systems with Applications, vol. 183, p. 115417, 2021.
  3. R. Quirynen, K. Berntorp, K. Kambam, and S. Di Cairano, “Integrated obstacle detection and avoidance in motion planning and predictive control of autonomous vehicles,” in 2020 American control conference (ACC).   IEEE, 2020, pp. 1203–1208.
  4. B. Kanchana, R. Peiris, D. Perera, D. Jayasinghe, and D. Kasthurirathna, “Computer vision for autonomous driving,” in 2021 3rd International Conference on Advancements in Computing (ICAC).   IEEE, 2021, pp. 175–180.
  5. C. Forster, M. Pizzoli, and D. Scaramuzza, “Appearance-based active, monocular, dense reconstruction for micro aerial vehicles,” 2014.
  6. C. Forster, M. Faessler, F. Fontana, M. Werlberger, and D. Scaramuzza, “Continuous on-board monocular-vision-based elevation mapping applied to autonomous landing of micro aerial vehicles,” in 2015 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2015, pp. 111–118.
  7. S. An, F. Zhou, M. Yang, H. Zhu, C. Fu, and K. A. Tsintotas, “Real-time monocular human depth estimation and segmentation on embedded systems,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 55–62.
  8. V.-C. Miclea and S. Nedevschi, “Monocular depth estimation with improved long-range accuracy for uav environment perception,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2021.
  9. T. Yang, P. Li, H. Zhang, J. Li, and Z. Li, “Monocular vision slam-based uav autonomous landing in emergencies and unknown environments,” Electronics, vol. 7, no. 5, p. 73, 2018.
  10. S. Weiss, M. W. Achtelik, S. Lynen, M. C. Achtelik, L. Kneip, M. Chli, and R. Siegwart, “Monocular vision for long-term micro aerial vehicle state estimation: A compendium,” Journal of Field Robotics, vol. 30, no. 5, pp. 803–831, 2013.
  11. D. Wofk, F. Ma, T.-J. Yang, S. Karaman, and V. Sze, “Fastdepth: Fast monocular depth estimation on embedded systems,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 6101–6108.
  12. I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” arXiv preprint arXiv:1812.11941, 2018.
  13. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in Neural Information Processing Systems, vol. 27, 2014.
  14. F. Liu, C. Shen, and G. Lin, “Deep convolutional neural fields for depth estimation from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  15. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014.
  16. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
  17. M. Hammami, D. Friboulet, and R. Kéchichian, “Cycle gan-based data augmentation for multi-organ detection in ct images via yolo,” in 2020 IEEE international conference on image processing (ICIP).   IEEE, 2020, pp. 390–393.
  18. H.-S. Tong, Y.-L. Ng, Z. Liu, J. D. Ho, P.-L. Chan, J. Y. Chan, and K.-W. Kwok, “Real-to-virtual domain transfer-based depth estimation for real-time 3d annotation in transnasal surgery: a study of annotation accuracy and stability,” International Journal of Computer Assisted Radiology and Surgery, vol. 16, pp. 731–739, 2021.
  19. Y. Ming, X. Meng, C. Fan, and H. Yu, “Deep learning for monocular depth estimation: A review,” Neurocomputing, vol. 438, pp. 14–33, 2021.
  20. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12.   Springer, 2012, pp. 746–760.
  21. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
  22. J. M. Facil, B. Ummenhofer, H. Zhou, L. Montesano, T. Brox, and J. Civera, “Cam-convs: Camera-aware multi-scale convolutions for single-view depth,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  23. R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 03, pp. 1623–1637, mar 2022.
  24. C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  25. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  26. M. Oquab, T. Darcet, T. Moutakanni, H. V. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y. Huang, H. Xu, V. Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supervision,” 2023.
  27. W. Zhao, Y. Rao, Z. Liu, B. Liu, J. Zhou, and J. Lu, “Unleashing text-to-image diffusion models for visual perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 5729–5739.
  28. L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, “Depth anything: Unleashing the power of large-scale unlabeled data,” in CVPR, 2024.
  29. S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” 2017.
  30. R. Reni, “House rooms image dataset,” https://www.kaggle.com/datasets/robinreni/house-rooms-image-dataset, 2023.
  31. A. Lu, “House rooms,” https://www.kaggle.com/datasets/annielu21/house-rooms, 2023.
  32. M. Building, “Position dataset,” https://universe.roboflow.com/model-building-f4zs6/position-jat94 , feb 2024, visited on 2024-03-15. [Online]. Available: https://universe.roboflow.com/model-building-f4zs6/position-jat94
  33. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.