Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models (2312.03970v1)
Abstract: Medical report generation demands automatic creation of coherent and precise descriptions for medical images. However, the scarcity of labelled medical image-report pairs poses formidable challenges in developing large-scale neural networks capable of harnessing the potential of artificial intelligence, exemplified by LLMs. This study builds upon the state-of-the-art vision-language pre-training and fine-tuning approach, BLIP-2, to customize general large-scale foundation models. Integrating adapter tuning and a medical knowledge enhancement loss, our model significantly improves accuracy and coherence. Validation on the dataset of ImageCLEFmedical 2023 demonstrates our model's prowess, achieving the best-averaged results against several state-of-the-art methods. Significant improvements in ROUGE and CIDEr underscore our method's efficacy, highlighting promising outcomes for the rapid medical-domain adaptation of the vision-language foundation models in addressing challenges posed by data scarcity.
- “Medklip: Medical knowledge enhanced language-image pre-training,” medRxiv, pp. 2023–01, 2023.
- “Large-scale domain-specific pretraining for biomedical vision-language processing,” arXiv preprint arXiv:2303.00915, 2023.
- “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.
- “Transferring pre-trained large language-image model for medical image captioning,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
- “Pclmed at imageclefmedical 2023: Customizing general-purpose foundation models for medical report generation,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
- “Adapter learning in pretrained feature extractor for continual learning of diseases,” arXiv preprint arXiv:2304.09042, 2023.
- “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
- “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190, 2021.
- “Overview of ImageCLEFmedical 2023 – Caption Prediction and Concept Detection,” in CLEF2023 Working Notes, Thessaloniki, Greece, September 18-21 2023, CEUR Workshop Proceedings, CEUR-WS.org.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- “Opt: Open pre-trained transformer language models,” arXiv preprint arXiv:2205.01068, 2022.
- “Concept-aware video captioning: Describing videos with effective prior information,” IEEE Transactions on Image Processing, vol. 32, pp. 5366–5378, 2023.
- “Aueb nlp group at imageclefmedical caption 2023,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
- “Detecting concepts and generating captions from medical images: Contributions of the vcmi team to image-clefmedical caption 2023,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
- Olivier Bodenreider, “The unified medical language system (umls): integrating biomedical terminology,” Nucleic acids research, vol. 32, no. suppl_1, pp. D267–D270, 2004.
- “A concise model for medical image captioning,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
- Shibin Wu (6 papers)
- Bang Yang (19 papers)
- Zhiyu Ye (1 paper)
- Haoqian Wang (74 papers)
- Hairong Zheng (71 papers)
- Tong Zhang (569 papers)