From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine (2502.09242v1)

Published 13 Feb 2025 in cs.AI

Abstract: Generative AI models, such as diffusion models and OpenAI's ChatGPT, are transforming medicine by enhancing diagnostic accuracy and automating clinical workflows. The field has advanced rapidly, evolving from text-only LLMs for tasks such as clinical documentation and decision support to multimodal AI systems capable of integrating diverse data modalities, including imaging, text, and structured data, within a single model. The diverse landscape of these technologies, along with rising interest, highlights the need for a comprehensive review of their applications and potential. This scoping review explores the evolution of multimodal AI, highlighting its methods, applications, datasets, and evaluation in clinical settings. Adhering to PRISMA-ScR guidelines, we systematically queried PubMed, IEEE Xplore, and Web of Science, prioritizing recent studies published up to the end of 2024. After rigorous screening, 144 papers were included, revealing key trends and challenges in this dynamic field. Our findings underscore a shift from unimodal to multimodal approaches, driving innovations in diagnostic support, medical report generation, drug discovery, and conversational AI. However, critical challenges remain, including the integration of heterogeneous data types, improving model interpretability, addressing ethical concerns, and validating AI systems in real-world clinical settings. This review summarizes the current state of the art, identifies critical gaps, and provides insights to guide the development of scalable, trustworthy, and clinically impactful multimodal AI solutions in healthcare.

PDF Abstract

From LLMs to Multimodal AI: A Review on Generative AI in Medicine

The paper "From LLMs to Multimodal AI: A Scoping Review on the Potential of Generative AI in Medicine" provides an exhaustive examination of the evolution and application of generative AI in the medical field. This scrutiny is crucial, given the rapid advancements in AI technologies, particularly the transformation from unimodal LLMs to multifaceted multimodal AI systems.

Evolution from LLMs to Multimodal AI

Initially, LLMs predominantly processed textual data, demonstrating significant capabilities in handling clinical documentation, enhancing diagnostic reasoning, and aiding in bioinformatics research. Through various adaptation techniques, including supervised finetuning (SFT), reinforcement learning from AI feedback (RLAIF), and retrieval augmented generation (RAG), these models have been refined to better suit medical applications. Noteworthy is the development of domain-specific models like BioBERT and models leveraging prompt engineering techniques, which ensure effective model responses without additional training.

The research equally highlights a pivot towards the integration of LLMs into multimodal systems that amalgamate diverse data types, such as medical images, text, and structured data. This convergence fosters comprehensive decision-support systems that better mimic human clinical reasoning. Advancements like CLIP (Contrastive Language-Image Pretraining) based methods and Multimodal LLMs (MLLMs) have facilitated tasks ranging from zero-shot image classification to image-text retrieval and interactive report generation.

Practical Implications and Challenges

The practical implications of these AI advancements in healthcare are profound. They encompass improved diagnostic accuracy, streamlined clinical workflows, and enhanced medical research capabilities. Particularly, MLLMs show promise in generating intricate radiology reports and supporting visual question answering tasks, potentially alleviating the workload burdens on healthcare providers.

However, the paper identifies several critical challenges that hinder widespread adoption. These include the complexity of integrating heterogeneous data types, ensuring the interpretability and trustworthiness of AI models, addressing the ethical concerns surrounding data use, and validating these systems within real-world clinical settings. The scarcity and lack of diversity in training datasets, such as MIMIC-IV, potentially introduce biases that restrict the generalizability of these models across varied healthcare contexts.

Theoretical Implications and Future Directions

Theoretically, the progression from unimodal to multimodal AI systems marks a significant shift in how AI can be utilized in medicine. This evolution underscores a broader trend towards universal AI models that can handle diverse medical tasks across multiple specialties and data modalities. This generalization capability, represented by models like BiomedGPT and MedVersa, emphasizes the potential of AI in facilitating integrated and holistic healthcare solutions.

Future developments in this domain should focus on improving the robustness and scalability of these models, expanding the diversity and representativeness of training data, and advancing context-specific evaluation frameworks that prioritize clinical relevance. Enhancing the understanding of these models' decision processes and developing comprehensive benchmarking standards for clinical AI systems will be crucial for their successful integration into healthcare practices.

Conclusion

Overall, this paper's thorough review sheds light on the burgeoning field of generative AI in medicine, charting a path from the foundational role of LLMs to the evolving landscape of multimodal AI systems. It provides a detailed account of current capabilities, identifies persistent challenges, and offers insights to guide further research in realizing scalable, trustworthy, and clinically effective AI solutions in medicine. The continued interdisciplinary collaboration will be vital to overcoming existing barriers and leveraging AI's potential to transform healthcare delivery and outcomes.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Lukas Buess (2 papers)
Matthias Keicher (25 papers)
Nassir Navab (458 papers)
Andreas Maier (394 papers)
Soroosh Tayebi Arasteh (23 papers)