Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do More With What You Have: Transferring Depth-Scale from Labeled to Unlabeled Domains (2303.07662v3)

Published 14 Mar 2023 in cs.CV and eess.IV

Abstract: Transferring the absolute depth prediction capabilities of an estimator to a new domain is a task with significant real-world applications. This task is specifically challenging when images from the new domain are collected without ground-truth depth measurements, and possibly with sensors of different intrinsics. To overcome such limitations, a recent zero-shot solution was trained on an extensive training dataset and encoded the various camera intrinsics. Other solutions generated synthetic data with depth labels that matched the intrinsics of the new target data to enable depth-scale transfer between the domains. In this work we present an alternative solution that can utilize any existing synthetic or real dataset, that has a small number of images annotated with ground truth depth labels. Specifically, we show that self-supervised depth estimators result in up-to-scale predictions that are linearly correlated to their absolute depth values across the domain, a property that we model in this work using a single scalar. In addition, aligning the field-of-view of two datasets prior to training, results in a common linear relationship for both domains. We use this observed property to transfer the depth-scale from source datasets that have absolute depth labels to new target datasets that lack these measurements, enabling absolute depth predictions in the target domain. The suggested method was successfully demonstrated on the KITTI, DDAD and nuScenes datasets, while using other existing real or synthetic source datasets, that have a different field-of-view, other image style or structural content, achieving comparable or better accuracy than other existing methods that do not use target ground-truth depths.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Semi-supervised monocular depth estimation with left-right consistency using deep neural network. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 602–607. IEEE, 2019.
  2. Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2800–2810, 2018.
  3. Semi-supervised learning with mutual distillation for monocular depth estimation. In 2022 International Conference on Robotics and Automation (ICRA), pages 4562–4569. IEEE, 2022.
  4. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4009–4018, 2021.
  5. Unsupervised scale-consistent depth and ego-motion learning from monocular video. Advances in neural information processing systems, 32, 2019.
  6. Virtual kitti 2, 2020.
  7. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  8. Multimodal scale consistency and awareness for monocular self-supervised depth estimation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 5140–5146. IEEE, 2021.
  9. Frequency-aware self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5808–5817, 2023.
  10. Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset. Expert Systems with Applications, 178:114877, 2021.
  11. Theil-sen estimators in a multiple linear regression model. Olemiss Edu, 2008.
  12. Designing for depth perceptions in augmented reality. In 2017 IEEE international symposium on mixed and augmented reality (ISMAR), pages 111–122. IEEE, 2017.
  13. Towards real-time monocular depth estimation for robotics: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(10):16940–16961, 2022.
  14. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27, 2014.
  15. Implementation of an autonomous ros-based mobile robot with ai depth estimation. In IECON 2021–47th Annual Conference of the IEEE Industrial Electronics Society, pages 1–6. IEEE, 2021.
  16. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2002–2011, 2018.
  17. Virtual worlds as proxy for multi-object tracking analysis. In CVPR, 2016.
  18. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 740–756. Springer, 2016.
  19. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  20. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3828–3838, 2019.
  21. Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8977–8986, 2019.
  22. 3d packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2485–2494, 2020a.
  23. Robust semi-supervised monocular depth estimation with reprojected distances. In Conference on robot learning, pages 503–512. PMLR, 2020b.
  24. Sparse auxiliary networks for unified monocular depth prediction and completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11078–11088, 2021a.
  25. Geometric unsupervised domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8537–8547, 2021b.
  26. Learning optical flow, depth, and scene flow without real-world labels. IEEE Robotics and Automation Letters, 7(2):3491–3498, 2022a.
  27. Full surround monodepth from multiple cameras. IEEE Robotics and Automation Letters, 7(2):5397–5404, 2022b.
  28. Towards zero-shot scale-aware monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9233–9243, 2023.
  29. Learning monocular depth by distilling cross-domain stereo networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 484–500, 2018.
  30. Multiple view geometry in computer vision. Cambridge university press, 2003.
  31. A stereoscopic video see-through augmented reality system based on real-time vision-based registration. In Proceedings IEEE Virtual Reality 2000 (Cat. No. 00CB37048), pages 255–262. IEEE, 2000.
  32. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision, pages 66–75, 2017.
  33. Stereo-vision-based crop height estimation for agricultural robots. Computers and Electronics in Agriculture, 181:105937, 2021.
  34. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6647–6655, 2017.
  35. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326, 2019.
  36. Learning feature decomposition for domain adaptive monocular depth estimation. arXiv preprint arXiv:2208.00160, 2022.
  37. Calibrating self-supervised monocular depth estimation. arXiv preprint arXiv:2009.07714, 2020.
  38. A progressive review: Emerging technologies for adas driven solutions. IEEE Transactions on Intelligent Vehicles, 7(2):326–341, 2021.
  39. PD. Parallel domain. https://paralleldomain.com/, 2021.
  40. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020.
  41. Self-supervised learning for monocular depth estimation on minimally invasive surgery scenes. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 7159–7165. IEEE, 2021.
  42. Do what you can, with what you have: Scale-aware and high quality monocular depth estimation without real world labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 988–997, 2022.
  43. Sparsity invariant cnns. In 2017 international conference on 3D Vision (3DV), pages 11–20. IEEE, 2017.
  44. Learning depth from monocular videos using direct methods. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2022–2030, 2018.
  45. Self-supervised monocular depth hints. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2162–2171, 2019.
  46. The temporal opportunist: Self-supervised multi-frame monocular depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1164–1174, 2021.
  47. Surrounddepth: Entangling surrounding views for self-supervised multi-camera depth estimation. In Conference on Robot Learning, pages 539–549. PMLR, 2023.
  48. Multimodal end-to-end autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(1):537–547, 2020.
  49. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2330–2337. IEEE, 2020.
  50. Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating imu motion dynamics. In European Conference on Computer Vision, pages 143–160. Springer, 2022.
  51. Geometry-aware symmetric domain adaptation for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9788–9798, 2019.
  52. T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
  53. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1851–1858, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alexandra Dana (3 papers)
  2. Nadav Carmel (3 papers)
  3. Amit Shomer (2 papers)
  4. Ofer Manela (1 paper)
  5. Tomer Peleg (6 papers)
Citations (1)