Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Radiology-GPT: A Large Language Model for Radiology (2306.08666v2)

Published 14 Jun 2023 in cs.CL and cs.AI

Abstract: We introduce Radiology-GPT, a LLM for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general LLMs such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative LLMs, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale LLMs that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.

Radiology-GPT: A Domain-Specific LLM for Enhanced Radiological Practice

This paper introduces Radiology-GPT, an innovative application of LLMs within the medical domain of radiology. Leveraging the MIMIC-CXR dataset, the authors utilize an instruction tuning approach to specifically tailor the model for radiology. This development underscores the ongoing expansion of NLP capabilities within highly specialized medical fields, presenting Radiology-GPT as a model that surpasses the performance of more general models like StableLM, Dolly, and LLaMA.

Methodology and Development

The core of Radiology-GPT is anchored on instruction tuning, specifically modeled on the Alpaca framework, which was initially derived from Meta's LLaMA 7B model. Training is executed on rich radiological data, predominantly the MIMIC-CXR dataset, which comprises extensive textual data derived from chest X-ray reports. The systematic preprocessing of this data ensures the extraction of relevant sections such as "Findings" and "Impression," which are pivotal in developing understanding and interpretation capabilities within the model.

The model's local implementation is a strategic decision, responding to HIPAA regulations and the paramount need for patient data privacy, often challenged by large commercial LLMs that require data uploads to external platforms. This localization not only aligns with privacy protocols but also exemplifies an approach that can be generalized to other medical specialties, potentially enabling hospitals to deploy their proprietary LLMs.

Evaluation and Findings

Radiology-GPT's performance is evaluated across five critical metrics: understandability, coherence, relevance, conciseness, and clinical utility. The model demonstrates notable capabilities in generating concise and clinically applicable impressions, indicative of its proficiency in handling complex radiological language and tasks. It exhibits superior performance relative to several instruction-tuned models not specifically tailored for radiology, thereby validating the efficacy of domain-specific tuning.

Moreover, Radiology-GPT addresses a significant gap in clinical practice. By generating impressions from findings, it mirrors the diagnostic processes of radiologists, providing intelligent assistance. However, its impartiality and effectiveness radically depend on ongoing engagement with the medical community to ensure continuous alignment with clinical needs and practices.

Implications and Future Directions

The implications of this research are manifold, impacting both the practicalities of everyday clinical work and theoretical advancements in medical AI. Practically, Radiology-GPT offers a sophisticated tool for aiding radiologists in their diagnostic processes, potentially enhancing both the accuracy and efficiency of radiological assessments. The fusion of its conversational and domain-specific capabilities could facilitate improved patient communication and streamlined decision support in clinical settings.

Theoretically, this work contributes to the ongoing discourse on the development of DSLMs, emphasizing the critical importance of domain-specific training data and the resultant enhancements in model performance. Furthermore, it points toward broader future directions, including the integration of multimodal data to extend Radiology-GPT's capabilities beyond text to image interpretation, aligning more closely with the comprehensive evaluation performed by radiologists.

Overall, Radiology-GPT exemplifies a significant stride toward specialized, privacy-preserving AI tools in healthcare, heralding a future where AI can substantially contribute to individualized patient care while adhering to ethical and privacy standards.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852, 2023.
  2. R OpenAI. Gpt-4 technical report. arXiv, 2023.
  3. When brain-inspired ai meets agi. arXiv preprint arXiv:2303.15935, 2023.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  5. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.
  6. Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007, 2023.
  7. Impressiongpt: An iterative optimizing framework for radiology report summarization with chatgpt. arXiv preprint arXiv:2304.08448, 2023.
  8. Context matters: A strategy to pre-train language model for science education. arXiv preprint arXiv:2301.12031, 2023.
  9. Exploring the trade-offs: Unified large language models vs local fine-tuned models for highly-specific radiology nli task. arXiv preprint arXiv:2304.09138, 2023.
  10. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  11. Deid-gpt: Zero-shot medical text de-identification by gpt-4. arXiv preprint arXiv:2303.11032, 2023.
  12. Radbert: Adapting transformer-based language models to radiology. Radiology: Artificial Intelligence, 4(4):e210258, 2022.
  13. Clinicalradiobert: Knowledge-infused few shot learning for clinical notes named entity recognition. In Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, pages 269–278. Springer, 2022.
  14. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
  15. Chatabl: Abductive learning via natural language interaction with chatgpt. arXiv preprint arXiv:2304.11107, 2023.
  16. Anel Islamovic. Stability AI Launches the First of its StableLM Suite of Language Models — Stability AI — stability.ai. https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models. [Accessed 09-Jun-2023].
  17. Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM — databricks.com. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm. [Accessed 09-Jun-2023].
  18. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
  19. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  20. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  21. Improving language understanding by generative pre-training. 2018.
  22. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  23. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  24. Mask-guided bert for few shot text classification. arXiv preprint arXiv:2302.10447, 2023.
  25. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  26. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  27. Stanford CRFM — crfm.stanford.edu. https://crfm.stanford.edu/2023/03/13/alpaca.html. [Accessed 09-Jun-2023].
  28. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
  29. Survey on natural language processing in medical image analysis. Zhong nan da xue xue bao. Yi xue ban= Journal of Central South University. Medical Sciences, 47(8):981–993, 2022.
  30. Agribert: knowledge-infused agricultural language models for matching food and nutrition. IJCAI, 2022.
  31. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2016.
  32. Word graph guided summarization for radiology findings. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4980–4990, 2021.
  33. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  34. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  35. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
  36. Interpretability at scale: Identifying causal mechanisms in alpaca. arXiv preprint arXiv:2305.08809, 2023.
  37. A Wallis and P McCoubrie. The radiology report—are we getting the message across? Clinical radiology, 66(11):1015–1022, 2011.
  38. Automatic structuring of radiology free-text reports. Radiographics, 21(1):237–245, 2001.
  39. Variability in radiologists’ interpretations of mammograms. New England Journal of Medicine, 331(22):1493–1499, 1994.
  40. The “laboratory” effect: comparing radiologists’ performance and variability during prospective clinical and laboratory mammography interpretations. Radiology, 249(1):47–53, 2008.
  41. Prostate magnetic resonance imaging interpretation varies substantially across radiologists. European urology focus, 5(4):592–599, 2019.
  42. KM Alhendawi and Ahmad Suhaimi Baharudin. String matching algorithms (smas): survey & empirical analysis. Journal of Computer Sciences and Management, 2013.
  43. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  44. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  45. Malik Sallam. The utility of chatgpt as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv, pages 2023–02, 2023.
  46. Key challenges for delivering clinical impact with artificial intelligence. BMC medicine, 17:1–9, 2019.
  47. Differentiate chatgpt-generated and human-written medical texts. arXiv preprint arXiv:2304.11567, 2023.
  48. Ct and mri protocol variation and optimization at an academic medical center. Journal of the American College of Radiology, 15(9):1254–1258, 2018.
  49. Patient-centered imaging: shared decision making for cardiac imaging procedures with exposure to ionizing radiation. Journal of the American College of Cardiology, 63(15):1480–1489, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Zhengliang Liu (91 papers)
  2. Aoxiao Zhong (16 papers)
  3. Yiwei Li (107 papers)
  4. Longtao Yang (4 papers)
  5. Chao Ju (7 papers)
  6. Zihao Wu (100 papers)
  7. Chong Ma (28 papers)
  8. Peng Shu (34 papers)
  9. Cheng Chen (262 papers)
  10. Sekeun Kim (15 papers)
  11. Haixing Dai (39 papers)
  12. Lin Zhao (227 papers)
  13. Dajiang Zhu (68 papers)
  14. Jun Liu (606 papers)
  15. Wei Liu (1135 papers)
  16. Dinggang Shen (153 papers)
  17. Xiang Li (1002 papers)
  18. Quanzheng Li (122 papers)
  19. Tianming Liu (161 papers)
  20. Lichao Sun (186 papers)
Citations (54)
Youtube Logo Streamline Icon: https://streamlinehq.com