Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MAIRA-1: A specialised large multimodal model for radiology report generation (2311.13668v3)

Published 22 Nov 2023 in cs.CL, cs.AI, and cs.CV

Abstract: We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that LLM(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned LLM based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems, volume 35, pages 23716–23736, 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html.
  2. PaLM 2 technical report, 2023. URL https://arxiv.org/abs/2305.10403.
  3. OpenFlamingo: An open-source framework for training large autoregressive vision-language models, August 2023. URL http://arxiv.org/abs/2308.01390. arXiv:2308.01390 [cs].
  4. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72. Association for Computational Linguistics, June 2005. URL https://aclanthology.org/W05-0909.
  5. Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15016–15027, 2023a.
  6. MS-CXR-T: Learning to exploit temporal structure for biomedical vision-language processing (version 1.0.0), 2023b. URL https://physionet.org/content/ms-cxr-t/1.0.0/.
  7. MS-CXR: Making the most of text semantics to improve biomedical vision-language processing (version 0.1), 2022. URL https://physionet.org/content/ms-cxr/0.1/.
  8. Generating radiology reports via memory-driven transformer, April 2022. URL http://arxiv.org/abs/2010.16056. arXiv:2010.16056 [cs].
  9. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  10. InstructBLIP: Towards general-purpose vision-language models with instruction tuning, 2023. URL https://arxiv.org/abs/2305.06500.
  11. Improving the factual correctness of radiology report generation with semantic rewards. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4348–4360. ACL, December 2022. doi:10.18653/v1/2022.findings-emnlp.319.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, October 2020. URL https://openreview.net/forum?id=YicbFdNTTy.
  13. PaLM-E: An embodied multimodal language model. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of PMLR, pages 8469–8488, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/driess23a.html.
  14. Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model. In Proceedings of Machine Learning for Health, page 209–219. PMLR, November 2021. URL https://proceedings.mlr.press/v158/endo21a.html.
  15. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, 2000.
  16. Alex Graves. Generating sequences with recurrent neural networks, 2013. URL https://arxiv.org/abs/1308.0850.
  17. Gaussian error linear units (GELUs), 2016. URL https://arxiv.org/abs/1606.08415.
  18. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA network open, 6(10):e2336100–e2336100, 2023.
  19. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2019), volume 33, pages 590–597. AAAI Press, July 2019. doi:10.1609/aaai.v33i01.3301590.
  20. RadGraph: Extracting clinical entities and relations from radiology reports. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, December 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/c8ffe9a587b126f152ed3d89a146b445-Abstract-round1.html.
  21. Multimodal image-text matching improves retrieval-based chest x-ray report generation. In Medical Imaging with Deep Learning (MIDL 2023), 2023. URL https://openreview.net/forum?id=aZ0OuYMSMMZ.
  22. PromptMRG: Diagnosis-driven prompts for medical report generation, August 2023. URL http://arxiv.org/abs/2308.12604. arXiv:2308.12604 [cs].
  23. Reproducibility in critical care: a mortality prediction case study. In Machine Learning for Healthcare Conference, pages 361–376. PMLR, 2017.
  24. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1):317, December 2019a. doi:10.1038/s41597-019-0322-0.
  25. MIMIC-CXR database (version 2.0.0). PhysioNet, 2019b.
  26. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs, November 2019c. URL http://arxiv.org/abs/1901.07042. arXiv:1901.07042 [cs, eess].
  27. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day, 2023. URL http://arxiv.org/abs/2306.00890.
  28. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81. Association for Computational Linguistics, July 2004. URL https://aclanthology.org/W04-1013.
  29. Clinically accurate chest x-ray report generation. In Finale Doshi-Velez, Jim Fackler, Ken Jung, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens, editors, Proceedings of the 4th Machine Learning for Healthcare Conference, volume 106 of Proceedings of Machine Learning Research, pages 249–269. PMLR, 09–10 Aug 2019. URL https://proceedings.mlr.press/v106/liu19a.html.
  30. Improved baselines with visual instruction tuning, 2023a. URL http://arxiv.org/abs/2310.03744.
  31. Visual instruction tuning, 2023b. URL http://arxiv.org/abs/2304.08485.
  32. Radiology-GPT: A large language model for radiology, June 2023c. URL http://arxiv.org/abs/2306.08666. arXiv:2306.08666 [cs].
  33. Improving factual completeness and consistency of image-to-text radiology report generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5288–5304. ACL, June 2021. doi:10.18653/v1/2021.naacl-main.416.
  34. Med-Flamingo: a multimodal medical few-shot learner, July 2023. URL http://arxiv.org/abs/2307.15189. arXiv:2307.15189 [cs].
  35. Improving chest X-ray report generation by leveraging warm starting, July 2023. URL http://arxiv.org/abs/2201.09405. arXiv:2201.09405 [cs].
  36. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318. Association for Computational Linguistics, July 2002. doi:10.3115/1073083.1073135.
  37. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 138 of PMLR, pages 8748–8763, July 2021. URL https://proceedings.mlr.press/v139/radford21a.html.
  38. Improving radiology report generation systems by removing hallucinated references to non-existent priors. In Machine Learning for Health, pages 456–473. PMLR, 2022.
  39. Simplified transfer learning for chest radiography models using less data. Radiology, 305(2):454–465, 2022. doi:10.1148/radiol.212482.
  40. Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1500–1519. ACL, November 2020. doi:10.18653/v1/2020.emnlp-main.117.
  41. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7433–7442, 2023. URL https://openaccess.thecvf.com/content/CVPR2023/html/Tanida_Interactive_and_Explainable_Region-Guided_Radiology_Report_Generation_CVPR_2023_paper.html.
  42. Alpaca: A strong, replicable instruction-following model, March 2023. URL https://crfm.stanford.edu/2023/03/13/alpaca.html.
  43. Towards generalist biomedical AI, July 2023. URL http://arxiv.org/abs/2307.14334. arXiv:2307.14334 [cs].
  44. TieNet: Text-image embedding network for common thorax disease classification and reporting in chest X-rays. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. URL https://openaccess.thecvf.com/content_cvpr_2018/html/Wang_TieNet_Text-Image_Embedding_CVPR_2018_paper.html.
  45. Chest imagenome dataset for clinical reasoning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2022. URL https://openreview.net/forum?id=H-d5634yVi.
  46. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders, August 2023. URL http://arxiv.org/abs/2308.01317. arXiv:2308.01317 [cs, eess].
  47. Style-aware radiology report generation with RadGraph and few-shot prompting, October 2023. URL http://arxiv.org/abs/2310.17811. arXiv:2310.17811 [cs].
  48. The effect of clinical history on diagnostic imaging interpretation–a systematic review. Academic Radiology, 29(2):255–266, 2022.
  49. Evaluating progress in automatic chest X-ray radiology report generation. medRxiv, 2022. doi:10.1101/2022.08.30.22279318. URL https://www.medrxiv.org/content/early/2022/08/31/2022.08.30.22279318.
  50. Evaluating progress in automatic chest X-ray radiology report generation. Patterns, 4(9), September 2023. doi:10.1016/j.patter.2023.100802.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Stephanie L. Hyland (20 papers)
  2. Shruthi Bannur (15 papers)
  3. Kenza Bouzid (9 papers)
  4. Daniel C. Castro (28 papers)
  5. Mercy Ranjit (9 papers)
  6. Anton Schwaighofer (13 papers)
  7. Fernando Pérez-García (16 papers)
  8. Valentina Salvatelli (19 papers)
  9. Shaury Srivastav (5 papers)
  10. Anja Thieme (7 papers)
  11. Noel Codella (21 papers)
  12. Matthew P. Lungren (43 papers)
  13. Maria Teodora Wetscherek (6 papers)
  14. Ozan Oktay (34 papers)
  15. Javier Alvarez-Valle (19 papers)
Citations (34)
X Twitter Logo Streamline Icon: https://streamlinehq.com