Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models (2312.03970v1)

Published 7 Dec 2023 in cs.CV, cs.AI, and cs.CE

Abstract: Medical report generation demands automatic creation of coherent and precise descriptions for medical images. However, the scarcity of labelled medical image-report pairs poses formidable challenges in developing large-scale neural networks capable of harnessing the potential of artificial intelligence, exemplified by LLMs. This study builds upon the state-of-the-art vision-language pre-training and fine-tuning approach, BLIP-2, to customize general large-scale foundation models. Integrating adapter tuning and a medical knowledge enhancement loss, our model significantly improves accuracy and coherence. Validation on the dataset of ImageCLEFmedical 2023 demonstrates our model's prowess, achieving the best-averaged results against several state-of-the-art methods. Significant improvements in ROUGE and CIDEr underscore our method's efficacy, highlighting promising outcomes for the rapid medical-domain adaptation of the vision-language foundation models in addressing challenges posed by data scarcity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. “Medklip: Medical knowledge enhanced language-image pre-training,” medRxiv, pp. 2023–01, 2023.
  2. “Large-scale domain-specific pretraining for biomedical vision-language processing,” arXiv preprint arXiv:2303.00915, 2023.
  3. “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
  4. “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.
  5. “Transferring pre-trained large language-image model for medical image captioning,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
  6. “Pclmed at imageclefmedical 2023: Customizing general-purpose foundation models for medical report generation,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
  7. “Adapter learning in pretrained feature extractor for continual learning of diseases,” arXiv preprint arXiv:2304.09042, 2023.
  8. “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  9. “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190, 2021.
  10. “Overview of ImageCLEFmedical 2023 – Caption Prediction and Concept Detection,” in CLEF2023 Working Notes, Thessaloniki, Greece, September 18-21 2023, CEUR Workshop Proceedings, CEUR-WS.org.
  11. “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  12. “Opt: Open pre-trained transformer language models,” arXiv preprint arXiv:2205.01068, 2022.
  13. “Concept-aware video captioning: Describing videos with effective prior information,” IEEE Transactions on Image Processing, vol. 32, pp. 5366–5378, 2023.
  14. “Aueb nlp group at imageclefmedical caption 2023,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
  15. “Detecting concepts and generating captions from medical images: Contributions of the vcmi team to image-clefmedical caption 2023,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
  16. Olivier Bodenreider, “The unified medical language system (umls): integrating biomedical terminology,” Nucleic acids research, vol. 32, no. suppl_1, pp. D267–D270, 2004.
  17. “A concise model for medical image captioning,” in CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS. org, Thessaloniki, Greece, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shibin Wu (6 papers)
  2. Bang Yang (19 papers)
  3. Zhiyu Ye (1 paper)
  4. Haoqian Wang (74 papers)
  5. Hairong Zheng (71 papers)
  6. Tong Zhang (569 papers)
Citations (1)