SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation (2409.13321v1)

Published 20 Sep 2024 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Inspired by the success of LLMs, there is growing research interest in developing LLMs in the medical domain to assist clinicians. However, for hospitals, using closed-source commercial LLMs involves privacy issues, and developing open-source public LLMs requires large-scale computational resources, which are usually limited, especially in resource-efficient regions and low-income countries. We propose an open-source Small Language and Vision Assistant (SLaVA-CXR) that can be used for Chest X-Ray report automation. To efficiently train a small assistant, we first propose the Re$^3$Training method, which simulates the cognitive development of radiologists and optimizes the model in the Recognition, Reasoning, and Reporting training manner. Then, we introduce a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance. The extensive experiments show that our SLaVA-CXR built on a 2.7B backbone not only outperforms but also achieves 6 times faster inference efficiency than previous state-of-the-art larger models.

PDF Abstract

Overview of SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

The paper under discussion, "SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation," presents an open-source Small Language and Vision Assistant designed to automate Chest X-ray (CXR) report generation. This research addresses privacy concerns associated with proprietary LLMs and the substantial computational resources required to develop public LLMs.

Key Contributions

The primary contributions of the paper are as follows:

SLaVA-CXR Development: A scalable, efficient vision, and language assistant, SLaVA-CXR, was introduced, which demonstrates superior performance over larger models while offering a sixfold increase in inference efficiency.
Re $^3$ Training Method: A novel training method simulating the cognitive development of radiologists through three stages - Recognition, Reasoning, and Reporting.
RADEX Data Synthesis: A data synthesis method capable of generating a high-quality, diverse training corpus compliant with privacy regulations.
Empirical Validation: Extensive experiments and human evaluations conducted to validate the performance and efficacy of SLaVA-CXR compared to state-of-the-art models.

Methodology

Re $^3$ Training Method

The Re $^3$ Training pipeline comprises three sequential stages:

Recognition: Focuses on aligning clinical concepts between visual features and text. The training objective here is to generate accurate captions that allow the model to learn the descriptive language of radiological patterns.
Reasoning: Enhances model capability to interpret visual patterns within a diagnostic context. This stage involves learning diagnostic reasoning through instruction tuning.
Reporting: Polishes the model's ability to generate professional-grade radiology reports by synthesizing learned knowledge into coherent, clinical documentation. This stage leverages a diverse, high-quality training corpus, RADEX, sourced from publicly available, privacy-compliant clinical reports.

RADEX Data Synthesis

RADEX, or RADiology EXpertise corpus, is constructed to address limitations in traditional clinical datasets like MIMIC-CXR. RADEX provides a more diverse and comprehensive dataset derived from peer-reviewed, open-access radiology case studies. The corpus includes:

Synthetic clinical notes generated using GPT-4 for a consistent format, integrating case description, presentation, and discussion.
Diverse conversational data to enhance the model's instruction-following capabilities, creating a more versatile and adaptive learning environment.

Experimental Results

Performance Metrics

The performance was assessed using standard benchmarks such as MIMIC-CXR and IU-Xray across tasks including CXR report generation and summarization. The metrics utilized include ROUGE, BLEU, METEOR, BERTScore, CheXbert, RadGraph, and RadCliQ.

SLaVA-CXR outperformed existing state-of-the-art models, achieving the highest scores across most metrics in both report generation and summarization tasks. Detailed results include:

Generation Task: SLaVA-CXR demonstrated strong performance improvements in ROUGE-L, METEOR, and BERTScore.
Summarization Task: Significant gains were observed in metrics like BERTScore and METEOR, indicating enhanced ability to succinctly summarize findings from CXR images.
Inference Efficiency: SLaVA-CXR exhibited superior computational efficiency, processing instances significantly faster than larger models while maintaining high performance.

Human Evaluation

Human evaluations involved medical experts assessing the correctness, completeness, and coherence of generated reports. SLaVA-CXR consistently received higher scores compared to models like LLaVA-Med, indicating better alignment with professional radiology reporting standards and more accurate clinical descriptions.

Implications and Future Directions

The development of SLaVA-CXR presents substantial implications for the integration of AI in medical imaging:

Practical Application: The enhanced efficiency and performance of SLaVA-CXR make it a viable tool for clinical deployment, potentially improving diagnostic workflows and patient care.
Scalability and Accessibility: SLaVA-CXR's architecture and training methodology offer a scalable solution, particularly for resource-limited regions, thus democratizing access to advanced AI tools in healthcare settings.

Future research may focus on further mitigating hallucination issues, incorporating multi-view support for more comprehensive CXR analysis, and expanding model capabilities to other imaging modalities and clinical areas.

In conclusion, the development of SLaVA-CXR marks an advancement in the field of AI-driven medical imaging, offering an efficient, privacy-compliant solution for automating radiology report generation with significant performance improvements over existing models.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Jinge Wu (18 papers)
Yunsoo Kim (12 papers)
Daqian Shi (14 papers)
David Cliffton (1 paper)
Fenglin Liu (54 papers)
Honghan Wu (33 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/OpenlifesciAI/status/1838367559504269712

https://twitter.com/Kokingkoal/status/1838139621785403462