Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports (2407.05758v1)

Published 8 Jul 2024 in eess.IV, cs.AI, and cs.CV

Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal LLMs (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in AGI for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy), and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Yutong Zhang (34 papers)
  2. Yi Pan (79 papers)
  3. Tianyang Zhong (19 papers)
  4. Peixin Dong (4 papers)
  5. Kangni Xie (1 paper)
  6. Yuxiao Liu (16 papers)
  7. Hanqi Jiang (27 papers)
  8. Zhengliang Liu (91 papers)
  9. Shijie Zhao (37 papers)
  10. Tuo Zhang (46 papers)
  11. Xi Jiang (53 papers)
  12. Dinggang Shen (153 papers)
  13. Tianming Liu (161 papers)
  14. Xin Zhang (904 papers)
Citations (2)