Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning (2502.19634v2)

Published 26 Feb 2025 in cs.CV and cs.AI

Abstract: Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual LLMs (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning to enhance transparency and trustworthiness. Instead of relying on supervised fine-tuning (SFT), which often suffers from overfitting to training distributions and fails to foster genuine reasoning, MedVLM-R1 employs a reinforcement learning framework that incentivizes the model to discover human-interpretable reasoning paths without using any reasoning references. Despite limited training data (600 visual question answering samples) and model parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks, outperforming larger models trained on over a million samples. It also demonstrates robust domain generalization under out-of-distribution tasks. By unifying medical image analysis with explicit reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable AI in clinical practice. Inference model is available at: https://huggingface.co/JZPeterPan/MedVLM-R1.

Summary

  • The paper introduces MedVLM-R1, a medical VLM using Group Relative Policy Optimization (GRPO) reinforcement learning to generate explicit, human-interpretable reasoning for radiological VQA without requiring explicit reasoning references.
  • MedVLM-R1, built on Qwen2-VL-2B, achieved a significant accuracy increase from 55.11% to 78.22% across diverse benchmarks using only 600 training samples, outperforming much larger models and demonstrating robust domain generalization.
  • The model employs a two-part rule-based reward function incorporating format rewards for structure (XML-like tags) and accuracy rewards for correctness, effectively guiding the VLM towards desired reasoning patterns and precise answers.

Here is a summary of the paper.

The paper "MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-LLMs (VLMs) via Reinforcement Learning" introduces MedVLM-R1, a novel medical VLM designed to generate explicit, natural language reasoning for enhanced transparency and trustworthiness in radiological visual question answering (VQA) tasks.

Here's a summary of the key points:

  • The paper employs Group Relative Policy Optimization (GRPO), a reinforcement learning (RL) technique, to train the VLM to discover and articulate human-interpretable reasoning paths without relying on explicit reasoning references, addressing the limitations of supervised fine-tuning (SFT) such as overfitting and poor out-of-distribution (OOD) generalization.
  • MedVLM-R1, built upon the Qwen2-VL-2B model, achieves a significant accuracy boost from 55.11\% to 78.22\% across MRI, CT, and X-ray benchmarks with limited training data (600 VQA samples), outperforming larger models like Qwen2-VL-72B and HuatuoGPT-Vision-7B, demonstrating robust domain generalization in OOD tasks.
  • The model uses a two-part rule-based reward function comprising a format reward to incentivize correct XML-like tag usage (i.e., > and <answer>), and an accuracy reward to promote answer correctness, effectively guiding the model to adopt the desired response structure and refine its answer selection for interpretable medical reasoning.