Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Depth Anything in Medical Images: A Comparative Study (2401.16600v1)

Published 29 Jan 2024 in cs.CV

Abstract: Monocular depth estimation (MDE) is a critical component of many medical tracking and mapping algorithms, particularly from endoscopic or laparoscopic video. However, because ground truth depth maps cannot be acquired from real patient data, supervised learning is not a viable approach to predict depth maps for medical scenes. Although self-supervised learning for MDE has recently gained attention, the outputs are difficult to evaluate reliably and each MDE's generalizability to other patients and anatomies is limited. This work evaluates the zero-shot performance of the newly released Depth Anything Model on medical endoscopic and laparoscopic scenes. We compare the accuracy and inference speeds of Depth Anything with other MDE models trained on general scenes as well as in-domain models trained on endoscopic data. Our findings show that although the zero-shot capability of Depth Anything is quite impressive, it is not necessarily better than other models in both speed and performance. We hope that this study can spark further research in employing foundation models for MDE in medical scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Depth anything: Unleashing the power of large-scale unlabeled data. arXiv:2401.10891, 2024.
  2. Multimodal foundation models: From specialists to general-purpose assistants, 2023.
  3. On the opportunities and risks of foundation models, 2022.
  4. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
  5. Tracking and mapping in medical computer vision: A review, 2023.
  6. Slam endoscopy enhanced by adversarial depth prediction, 2019.
  7. Reconstructing sinus anatomy from endoscopic video – towards a radiation-free approach for quantitative longitudinal assessment, 2020.
  8. Sage: Slam with appearance and geometry prior for endoscopy, 2022.
  9. Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos: Endo-sfmlearner, 2020.
  10. Colonoscopy 3d video dataset with paired depth from 2d-3d registration. Medical Image Analysis, 90:102956, December 2023.
  11. Dense depth estimation in monocular endoscopy with self-supervised learning methods, 2019.
  12. Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue. Medical Image Analysis, 77:102338, 2022.
  13. Self-supervised monocular depth estimation for gastrointestinal endoscopy. Computer Methods and Programs in Biomedicine, 238:107619, 2023.
  14. Midas v3.1 – a model zoo for robust monocular relative depth estimation. arXiv preprint arXiv:2307.14460, 2023.
  15. Zoedepth: Zero-shot transfer by combining relative and metric depth, 2023.
  16. Endo-depth-and-motion: Reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robotics and Automation Letters, 6(4):7225–7232, 2021.
  17. Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Medical Image Analysis, 71:102058, 2021.
  18. Deep learning and conditional random fields-based depth estimation and topographical reconstruction from conventional endoscopy. Medical Image Analysis, 48:230–243, 2018.
  19. Depth estimation of endoscopy using sim-to-real transfer, 2021.
  20. Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy, page 128–138. Springer International Publishing, 2018.
  21. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt, 2023.
  22. Foundation models for biomedical image segmentation: A survey, 2024.
  23. Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259—265, April 2023.
  24. OpenAI. Chatgpt-3.5: A language model by openai. 2022.
  25. Chatgpt in healthcare: A taxonomy and systematic review. Computer Methods and Programs in Biomedicine, 245:108013, 2024.
  26. Erwin Loh. Chatgpt and generative ai chatbots: challenges and opportunities for science, medicine and medical leaders. BMJ Leader, 2023.
  27. Uzair Munaf Taha Bin Arif and Ibtehaj Ul-Haque. The future of medical education and research: Is chatgpt a blessing or blight in disguise? Medical Education Online, 28(1):2181052, 2023. PMID: 36809073.
  28. Segment anything, 2023.
  29. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging, 2023.
  30. Segment anything model for medical image analysis: An experimental study. Medical Image Analysis, 89:102918, 2023.
  31. Isometric non-rigid shape-from-motion with riemannian geometry solved in linear time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(10):2442–2454, 2018.
  32. Ultrasound guidance can be used safely for renal tract dilatation during percutaneous nephrolithotomy. BJU International, 125(2):284–291.
  33. The future of endoscopic navigation: A review of advanced endoscopic vision technology. IEEE Access, 9:41144–41167, 2021.
  34. Arthronet: a monocular depth estimation technique with 3d segmented maps for knee arthroscopy. Intelligent Medicine, 3(2):129–138, 2023.
  35. Survey and evaluation of rgb-d slam. IEEE Access, 9:21367–21387, 2021.
  36. 3d reconstruction of virtual colon structures from colonoscopy images. Computerized Medical Imaging and Graphics, 38(1):22–33, 2014.
  37. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022.
  38. Vision transformers for dense prediction. ICCV, 2021.
  39. Repurposing diffusion-based image generators for monocular depth estimation, 2023.
  40. Dinov2: Learning robust visual features without supervision, 2023.
  41. Cutmix: Regularization strategy to train strong classifiers with localizable features, 2019.
  42. Three-dimensional tissue deformation recovery and tracking. IEEE Signal Processing Magazine, 27(4):14–24, 2010.
  43. Efficient large-scale stereo matching. In Asian Conference on Computer Vision (ACCV), 2010.
  44. Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Transactions on Circuits and Systems for Video Technology, 31(11):4381–4393, 2021.
  45. Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. International journal of computer assisted radiology and surgery, pages 1–10.
Citations (4)

Summary

We haven't generated a summary for this paper yet.