MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models (2411.11362v1)
Abstract: There is growing interest in applying AI to radiology report generation, particularly for chest X-rays (CXRs). This paper investigates whether incorporating pixel-level information through segmentation masks can improve fine-grained image interpretation of multimodal LLMs (MLLMs) for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware MLLM framework designed to utilize semantic segmentation masks alongside CXRs for generating radiology reports. We train expert segmentation models to obtain mask pseudolabels for radiology-specific structures in CXRs. Subsequently, building on the architectures of MAIRA, a CXR-specialised model for report generation, we integrate a trainable segmentation tokens extractor that leverages these mask pseudolabels, and employ mask-aware prompting to generate draft radiology reports. Our experiments on the publicly available MIMIC-CXR dataset show that MAIRA-Seg outperforms non-segmentation baselines. We also investigate set-of-marks prompting with MAIRA and find that MAIRA-Seg consistently demonstrates comparable or superior performance. The results confirm that using segmentation masks enhances the nuanced reasoning of MLLMs, potentially contributing to better clinical outcomes.
- Harshita Sharma (13 papers)
- Valentina Salvatelli (19 papers)
- Shaury Srivastav (5 papers)
- Kenza Bouzid (9 papers)
- Shruthi Bannur (15 papers)
- Daniel C. Castro (28 papers)
- Maximilian Ilse (11 papers)
- Sam Bond-Taylor (10 papers)
- Mercy Prasanna Ranjit (1 paper)
- Fabian Falck (20 papers)
- Fernando Pérez-García (16 papers)
- Anton Schwaighofer (13 papers)
- Hannah Richardson (5 papers)
- Maria Teodora Wetscherek (6 papers)
- Stephanie L. Hyland (20 papers)
- Javier Alvarez-Valle (19 papers)