Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance (2311.18681v1)

Published 30 Nov 2023 in cs.CV and cs.CL

Abstract: Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-LLM for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a LLM while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

RaDialog: A Vision-LLM for Radiology Report Generation and Conversation

The paper introduces RaDialog, a novel vision-LLM specifically designed to enhance radiology report generation and interactive dialogue with expert clinicians. It addresses the increasing demands for automated systems that not only generate clinically accurate reports based on medical images but also facilitate interactive conversations, enabling real-time corrections and queries from radiologists. The development and evaluation of RaDialog represent a significant contribution to the field, particularly given the increasing volume of chest X-rays and the demand for fast, reliable diagnostic procedures.

Methodology

RaDialog's architecture comprises three main components: an Image Feature Extraction Module, a Prompt Construction Module, and a LLM. The Image Feature Extraction Module uses a pre-trained BioViL-T model to capture visual features from X-ray images, while a CheXpert Classifier provides structured pathology findings. These image descriptors are processed to create a comprehensive prompt for the LLM, which is specialized in radiology tasks through parameter-efficient fine-tuning.

The model is trained using a diverse instruct dataset designed to prevent catastrophic forgetting and maintain the conversational abilities while focusing on radiology-specific knowledge. This dataset includes various tasks such as report generation, correction, question answering, and explanations, some of which are derived from existing datasets, while others are based on pseudo ground truths generated by general LLMs. This ensures that RaDialog remains versatile across several downstream tasks, which is crucial for its application in dynamic clinical environments.

Results

RaDialog effectively improves the state-of-the-art in clinical correctness for radiology reports. It demonstrates a 7.3% improvement in clinical efficacy on the MIMIC-CXR dataset, supporting the model's claim of providing reliable report generation. While traditional NLG metrics such as BLEU, METEOR, and ROUGE scores are lower compared to other models, RaDialog excels in BertScore, indicating that its semantic understanding and generation capability align well with diagnostic correctness over literal phrase matching. This reinforces the notion in the community that standard NLG metrics might not fully capture the value of generated radiology reports in terms of clinical relevance.

Discussion

Beyond report generation, the RaDialog model is able to conduct interactive downstream tasks, such as correcting errors in reports based on incorrect pathology labels detected during the initial generation. This capability marks a significant advancement in radiology assistance, offering real-time flexibility and adaptability that static generation models cannot provide. Furthermore, RaDialog demonstrates strong performance in emulating radiological reasoning and explanation tasks, showing potential for use as an educational or guidance tool in clinical settings.

Implications and Future Directions

The development of RaDialog holds substantial implications for real-world radiology practices by increasing the efficiency and accuracy of report generation while supporting collaborative workflows between AI tools and expert radiologists. The research emphasizes the importance of integrating specialized domain-specific knowledge into AI models to offer practical applications in clinical environments.

Future developments might focus on extending RaDialog's capabilities to handle multi-view or longitudinal studies, integrating additional patient data to offer even more nuanced diagnostic insights. Moreover, conducting clinical evaluations could verify the model's performance in practice, further bridging the gap between AI research and clinical implementation. As RaDialog is publicly available, it paves the way for ongoing research and refinement, allowing specialists and researchers to explore enhanced methodologies within medical image processing and LLM alignment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  3. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
  4. Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15016–15027, 2023.
  5. Baselines for Chest X-Ray Report Generation. In Proceedings of the Machine Learning for Health NeurIPS Workshop, pages 126–140. PMLR, 2020.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Generating radiology reports via memory-driven transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1439–1449, 2020.
  8. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023.
  9. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  10. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  11. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2016.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  13. William Falcon and The PyTorch Lightning team. PyTorch Lightning, 2019.
  14. Evidence-based guideline for the written radiology report: Methods, recommendations and implementation challenges. Journal of medical imaging and radiation oncology, 57(1):1–7, 2013.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  16. Kiut: Knowledge-injected u-transformer for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818, 2023.
  17. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, pages 590–597, 2019.
  18. Multimodal image-text matching improves retrieval-based chest x-ray report generation. arXiv preprint arXiv:2303.17579, 2023.
  19. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
  20. Methods for automatic generation of radiological reports of chest radiographs: a comprehensive survey. Multimedia Tools and Applications, 81(10):13409–13439, 2022.
  21. Explaining chest x-ray pathologies in natural language. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 701–713. Springer, 2022.
  22. Flexr: Few-shot classification with language embeddings for structured reporting of chest x-rays. In Medical Imaging with Deep Learning, 2023.
  23. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890, 2023a.
  24. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023b.
  25. Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3334–3343, 2023c.
  26. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. Cureus, 15(6), 2023d.
  27. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, 2004. Association for Computational Linguistics.
  28. Exploring and distilling posterior and prior knowledge for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13753–13762, 2021.
  29. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
  30. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023b.
  31. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747, 2023.
  32. Improving factual completeness and consistency of image-to-text radiology report generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5288–5304, Online, 2021. Association for Computational Linguistics.
  33. Med-flamingo: a multimodal medical few-shot learner. arXiv preprint arXiv:2307.15189, 2023.
  34. Progressive transformer-based generation of radiology reports. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2824–2832, 2021.
  35. OpenAI. Gpt-4 technical report, 2023.
  36. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  37. Rad-restruct: A novel vqa benchmark and method for structured radiology reporting. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 409–419. Springer, 2023.
  38. Inspecting state of the art performance and nlp metrics in image-based medical report generation. arXiv preprint arXiv:2011.09257, 2020.
  39. Clinically correct report generation from chest x-rays using templates. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pages 654–663. Springer, 2021.
  40. Abi Rimmer. Radiologist shortage leaves patient care at risk, warns royal college. BMJ: British Medical Journal (Online), 359, 2017.
  41. ANTHONY ROBINS. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 7(2):123–146, 1995.
  42. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023a.
  43. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023b.
  44. Chexbert: combining automatic labelers and expert annotations for accurate radiology report labeling using bert. arXiv preprint arXiv:2004.09167, 2020.
  45. Chest x-ray report generation through fine-grained label learning. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pages 561–571, 2020.
  46. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7433–7442, 2023.
  47. Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031, 2023.
  48. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  49. Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334, 2023.
  50. An inclusive task-aware framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 568–577. Springer, 2022.
  51. Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11558–11567, 2023.
  52. Elixr: Towards a general purpose x-ray artificial intelligence system through alignment of large language models and radiology vision encoders. arXiv preprint arXiv:2308.01317, 2023.
  53. Weakly supervised contrastive learning for chest x-ray report generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4009–4015, 2021.
  54. Evaluating progress in automatic chest x-ray radiology report generation. Patterns, 4(9), 2023.
  55. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chantal Pellegrini (15 papers)
  2. Ege Özsoy (19 papers)
  3. Benjamin Busam (82 papers)
  4. Nassir Navab (458 papers)
  5. Matthias Keicher (25 papers)
Citations (19)