Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering (2404.16192v1)

Published 24 Apr 2024 in cs.CL and cs.CV

Abstract: Vision-LLMs, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-LLM that integrates large vision and LLMs adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Cuong Nhat Ha (1 paper)
  2. Shima Asaadi (3 papers)
  3. Sanjeev Kumar Karn (10 papers)
  4. Oladimeji Farri (12 papers)
  5. Tobias Heimann (4 papers)
  6. Thomas Runkler (34 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com