A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models (2402.11217v2)

Published 17 Feb 2024 in cs.CL and cs.CV

Abstract: The significant breakthroughs of Medical Multi-Modal LLMs (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the complexity of real-world diagnostics across diverse specialties. To address this gap, we introduce Asclepius, a novel Med-MLLM benchmark that comprehensively assesses Med-MLLMs in terms of: distinct medical specialties (cardiovascular, gastroenterology, etc.) and different diagnostic capacities (perception, disease analysis, etc.). Grounded in 3 proposed core principles, Asclepius ensures a comprehensive evaluation by encompassing 15 medical specialties, stratifying into 3 main categories and 8 sub-categories of clinical tasks, and exempting overlap with existing VQA dataset. We further provide an in-depth analysis of 6 Med-MLLMs and compare them with 3 human specialists, providing insights into their competencies and limitations in various medical contexts. Our work not only advances the understanding of Med-MLLMs' capabilities but also sets a precedent for future evaluations and the safe deployment of these models in clinical environments.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (50)

Authors (11)

Wenxuan Wang (128 papers)
Yihang Su (3 papers)
Jingyuan Huan (1 paper)
Jie Liu (492 papers)
Wenting Chen (26 papers)
Yudi Zhang (19 papers)
Cheng-Yi Li (3 papers)
Kao-Jung Chang (3 papers)
Xiaohan Xin (1 paper)
Linlin Shen (133 papers)
Michael R. Lyu (176 papers)

Citations (7)

View on Semantic Scholar

A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models (2402.11217v2)

Related Papers