MAIRA-1: A specialised large multimodal model for radiology report generation (2311.13668v3)

Published 22 Nov 2023 in cs.CL, cs.AI, and cs.CV

Abstract: We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that LLM(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned LLM based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (50)

Authors (15)

Stephanie L. Hyland (20 papers)
Shruthi Bannur (15 papers)
Kenza Bouzid (9 papers)
Daniel C. Castro (28 papers)
Mercy Ranjit (9 papers)
Anton Schwaighofer (13 papers)
Fernando Pérez-García (16 papers)
Valentina Salvatelli (19 papers)
Shaury Srivastav (5 papers)
Anja Thieme (7 papers)
Noel Codella (21 papers)
Matthew P. Lungren (43 papers)
Maria Teodora Wetscherek (6 papers)
Ozan Oktay (34 papers)
Javier Alvarez-Valle (19 papers)

Citations (34)

View on Semantic Scholar

Tweets

https://twitter.com/MaxIlse/status/1778049756440265003

MAIRA-1: A specialised large multimodal model for radiology report generation (2311.13668v3)

Related Papers

Tweets