Translating Radiology Reports into Plain Language using ChatGPT and GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential (2303.09038v3)

Published 16 Mar 2023 in cs.CL, cs.AI, and physics.med-ph

Abstract: The LLM called ChatGPT has drawn extensively attention because of its human-like expression and reasoning abilities. In this study, we investigate the feasibility of using ChatGPT in experiments on using ChatGPT to translate radiology reports into plain language for patients and healthcare providers so that they are educated for improved healthcare. Radiology reports from 62 low-dose chest CT lung cancer screening scans and 76 brain MRI metastases screening scans were collected in the first half of February for this study. According to the evaluation by radiologists, ChatGPT can successfully translate radiology reports into plain language with an average score of 4.27 in the five-point system with 0.08 places of information missing and 0.07 places of misinformation. In terms of the suggestions provided by ChatGPT, they are general relevant such as keeping following-up with doctors and closely monitoring any symptoms, and for about 37% of 138 cases in total ChatGPT offers specific suggestions based on findings in the report. ChatGPT also presents some randomness in its responses with occasionally over-simplified or neglected information, which can be mitigated using a more detailed prompt. Furthermore, ChatGPT results are compared with a newly released large model GPT-4, showing that GPT-4 can significantly improve the quality of translated reports. Our results show that it is feasible to utilize LLMs in clinical education, and further efforts are needed to address limitations and maximize their potential.

PDF Abstract

Translating Radiology Reports into Plain Language with ChatGPT and GPT-4

The paper explored the application of advanced AI models, particularly ChatGPT and its successor GPT-4, to translate radiology reports into plain language for better patient and healthcare provider understanding. The authors analyzed radiology reports from 62 chest CT scans and 76 brain MRI scans, aiming to evaluate the feasibility and efficacy of ChatGPT in generating understandable, layman versions of these expert documents.

Performance and Evaluation

The performance evaluation of ChatGPT in translating radiology reports centers on three key aspects: completeness, correctness, and overall readability. Following feedback from experienced radiologists, ChatGPT translations achieved a mean score of 4.27 out of 5, indicating a generally efficient translation into plain language. Missing information was rare, averaging 0.08 occurrences per report, while misinformation occurred even less frequently, at 0.07 places per report. Furthermore, analysis showed a reduction in report length by 26.7% for chest CT and 21.1% for brain MRI reports, suggesting ChatGPT's capability to condense information effectively while maintaining clarity.

Implications of AI Utilization

The AI's ability to refine complex medical jargon into comprehensible language signifies its utility in patient education, reducing anxiety, enhancing compliance, and ultimately improving health outcomes. This capability holds particular promise for communicating radiological findings, where terminology can be overwhelmingly complex for the layperson. Through prompt learning, the paper showed improvements in translation when more detailed prompts were used, highlighting prompt specificity as a significant factor in information quality retention.

Randomness in Responses

Interestingly, ChatGPT exhibited variability in translating identical reports, suggesting stochastic elements in its output. This variability was mitigated by providing detailed textual templates instructing format and content specifics, improving consistency. The paper also noted an ensemble learning approach to synthesize results across multiple translations, though improvements were marginal compared to prompt optimization.

GPT-4 Comparative Analysis

The launch of GPT-4 offered enhancements over ChatGPT, notably in translation quality. With optimized prompts, GPT-4 almost achieved perfect translation quality in terms of completeness, showcasing the rapid improvements in LLMs. GPT-4's superior performance highlights the iterative nature of AI advancements and its potential role in healthcare communications.

Future Implications and Challenges

While the paper underscores the immediacy of AI applications in healthcare, several challenges persist. Ensuring translation completeness without trivial omissions and maintaining interpretative consistency remain critical hurdles. Moreover, the development of AI with built-in templates for generating translations in fixed formats could further streamline readability and understanding for patients and healthcare providers. The continual evolution of AI technologies, such as GPT models, offer encouraging prospects for expanding applications beyond translation to report generation directly from imaging data, personalized treatment strategies, and patient management suggestions.

In conclusion, the paper successfully demonstrates the applicability and potential of LLMs like ChatGPT and GPT-4 in translating complex radiology reports into accessible language. It serves as a foundation for integrating AI tools not only for improving communication but also for enhancing overall clinical workflows in healthcare settings.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Qing Lyu (35 papers)
Josh Tan (3 papers)
Michael E. Zapadka (1 paper)
Janardhana Ponnatapura (1 paper)
Chuang Niu (42 papers)
Kyle J. Myers (9 papers)
Ge Wang (214 papers)
Christopher T. Whitlow (23 papers)

Citations (170)

View on Semantic Scholar