Extent of LVLMs’ capability to meet diverse clinical demands
Determine the extent to which large vision-language models (including general-purpose models such as DeepSeek-VL, GPT-4V/GPT-4o, Claude3-Opus, Gemini, and Qwen-VL, as well as medical-specific models such as MedDr, LLaVA-Med, Med-Flamingo, RadFM, and Qilin-Med-VL-Chat) can accommodate the diverse demands encountered in real-world clinical scenarios across modalities, tasks, departments, and perceptual granularities.
References
However, it remains unclear to what extent these LVLMs can accommodate the diverse demands in real clinical scenarios.
                — GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
                
                (2408.03361 - Chen et al., 6 Aug 2024) in Abstract; Introduction