RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance (2311.18681v1)

Published 30 Nov 2023 in cs.CV and cs.CL

Abstract: Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-LLM for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a LLM while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

PDF HTML Abstract

RaDialog: A Vision-LLM for Radiology Report Generation and Conversation

The paper introduces RaDialog, a novel vision-LLM specifically designed to enhance radiology report generation and interactive dialogue with expert clinicians. It addresses the increasing demands for automated systems that not only generate clinically accurate reports based on medical images but also facilitate interactive conversations, enabling real-time corrections and queries from radiologists. The development and evaluation of RaDialog represent a significant contribution to the field, particularly given the increasing volume of chest X-rays and the demand for fast, reliable diagnostic procedures.

Methodology

RaDialog's architecture comprises three main components: an Image Feature Extraction Module, a Prompt Construction Module, and a LLM. The Image Feature Extraction Module uses a pre-trained BioViL-T model to capture visual features from X-ray images, while a CheXpert Classifier provides structured pathology findings. These image descriptors are processed to create a comprehensive prompt for the LLM, which is specialized in radiology tasks through parameter-efficient fine-tuning.

The model is trained using a diverse instruct dataset designed to prevent catastrophic forgetting and maintain the conversational abilities while focusing on radiology-specific knowledge. This dataset includes various tasks such as report generation, correction, question answering, and explanations, some of which are derived from existing datasets, while others are based on pseudo ground truths generated by general LLMs. This ensures that RaDialog remains versatile across several downstream tasks, which is crucial for its application in dynamic clinical environments.

Results

RaDialog effectively improves the state-of-the-art in clinical correctness for radiology reports. It demonstrates a 7.3% improvement in clinical efficacy on the MIMIC-CXR dataset, supporting the model's claim of providing reliable report generation. While traditional NLG metrics such as BLEU, METEOR, and ROUGE scores are lower compared to other models, RaDialog excels in BertScore, indicating that its semantic understanding and generation capability align well with diagnostic correctness over literal phrase matching. This reinforces the notion in the community that standard NLG metrics might not fully capture the value of generated radiology reports in terms of clinical relevance.

Discussion

Beyond report generation, the RaDialog model is able to conduct interactive downstream tasks, such as correcting errors in reports based on incorrect pathology labels detected during the initial generation. This capability marks a significant advancement in radiology assistance, offering real-time flexibility and adaptability that static generation models cannot provide. Furthermore, RaDialog demonstrates strong performance in emulating radiological reasoning and explanation tasks, showing potential for use as an educational or guidance tool in clinical settings.

Implications and Future Directions

The development of RaDialog holds substantial implications for real-world radiology practices by increasing the efficiency and accuracy of report generation while supporting collaborative workflows between AI tools and expert radiologists. The research emphasizes the importance of integrating specialized domain-specific knowledge into AI models to offer practical applications in clinical environments.

Future developments might focus on extending RaDialog's capabilities to handle multi-view or longitudinal studies, integrating additional patient data to offer even more nuanced diagnostic insights. Moreover, conducting clinical evaluations could verify the model's performance in practice, further bridging the gap between AI research and clinical implementation. As RaDialog is publicly available, it paves the way for ongoing research and refinement, allowing specialists and researchers to explore enhanced methodologies within medical image processing and LLM alignment.

PDF Markdown Bookmark Chat (Pro)

References (55)

Authors (5)

Chantal Pellegrini (15 papers)
Ege Özsoy (19 papers)
Benjamin Busam (82 papers)
Nassir Navab (458 papers)
Matthias Keicher (25 papers)

Citations (19)

View on Semantic Scholar

GitHub

GitHub - ChantalMP/RaDialog: Official code for the Paper "RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance" (101 stars)