MedAgent-Zero: Zero-Shot Medical Reasoning

Updated 17 March 2026

MedAgent-Zero is a framework that partitions a single LLM into specialized agents simulating expert medical reasoning without additional training.
It employs distinct roles for question analysis, option evaluation, summarization, and iterative consensus to ensure robust decision-making.
Empirical evaluation shows that MedAgent-Zero achieves state-of-the-art zero-shot accuracy on medical QA benchmarks, outperforming methods like CoT and few-shot baselines.

MedAgent-Zero is a collaborative, agent-based framework designed to harness the latent medical reasoning capabilities of LLMs in a zero-shot, training-free setting. Developed specifically to address domain-adaptation challenges in medicine, MedAgent-Zero systematically partitions a single LLM instance into specialized agents with distinct medical roles. Each agent role is invoked through carefully designed prompts, enabling nuanced, multi-expert discourse, iterative summarization, critical debate, and consensus-driven decision-making, all without in-context demonstrations or parameter updates. Empirical evaluation demonstrates that MedAgent-Zero achieves state-of-the-art zero-shot accuracy on a suite of medical QA benchmarks, surpassing both simple prompting and advanced chain-of-thought (CoT) methods (Tang et al., 2023).

1. Agent Roles and Overall Architecture

MedAgent-Zero models a single LLM, such as GPT-3.5-Turbo or GPT-4, as a coordinated team of virtual agents, each simulating expertise in distinct medical subfields. The primary roles include:

Question-Domain Experts (QD): A set of $m$ agents, each adopting a system role corresponding to a selected medical specialty (e.g., cardiology, pulmonology). These agents initially interpret the clinical scenario.
Option-Domain Experts (OD): $n$ agents, each responsible for adjudicating answer options based on their designated medical subfields.
Medical Report Assistant (Summarizer): This agent synthesizes all individual expert analyses into a coherent, structured report emphasizing “Key Knowledge” and “Total Analysis.”
Collaborative Voters/Editors: The union of QD and OD agents participate as reviewers, voting on report drafts, proposing edits, and facilitating iterative consensus.
Decision Maker: Given the unanimously approved report, a final role determines the single best answer.

Each agent is purely instantiated via prompt engineering using constructs such as “You are a [Clinical Domain] Expert,” with no model fine-tuning, gradient updates, or external context.

2. Stage-Wise Workflow and Mathematical Formulation

MedAgent-Zero executes medical reasoning in five critical stages:

Expert Gathering:
- QD = LLM( $q$ , $r_{qd}$ , $prompt_{qd}$ )
- OD = LLM( $q$ , $op$ , $r_{od}$ , $prompt_{od}$ )
Individual Analyses:
- For each $qd_i \in QD$ : $qa_i = LLM(q; r_{qa}, prompt_{qa})$
- For each $od_j \in OD$ : $oa_j = LLM(q, op, \{qa_{i}\}; r_{oa}, prompt_{oa})$
Report Summarization:
- $\text{Repo}_0 = LLM(\{qa_i\}, \{oa_j\}; r_{rs}, prompt_{rs})$
Iterative Multi-Round Discussion:
- For $D = QD \cup OD$ $D = Q D \cup O D$ , perform collaborative consultation:
  - Each agent $d_i$ votes (“yes”/“no”) using $LLM(R_{cur}; role=d_i, prompt=p_{vote})$
  - If any vote is “no”, $LLM(R_{cur}; role=d_i, prompt=p_{mod})$ proposes edits
  - Summarizer integrates all suggested edits with $LLM(R_{cur}, \{Mod_i\}; r_{rs}, prompt_{rev})$
  - Repeat until all votes are “yes” or maximum iteration $k$ is reached
- Final output is the unanimous report $R_f$ , where consensus is defined as $n_{agree} = \sum_{i=1}^N \mathbb{1}(\text{vote}_i = “yes”) = N$
Final Decision Making:
- $ans = LLM(q, op, R_f; r_{dm}, prompt_{dm})$ , yielding an answer of the form “Option: [A/B/C/D/E]”

All information flow is unidirectional: experts $\Rightarrow$ analyses $\Rightarrow$ synthesis $\Rightarrow$ consensus $\Rightarrow$ decision, and no learning or memory is retained across tasks.

3. Zero-Shot Operational Principles

MedAgent-Zero strictly operates in a zero-shot regime—no in-context exemplars or few-shot demonstrations are used at any pipeline stage. Only natural-language prompts and role designations guide behavior. Sampling parameters are consistently set to temperature = 1.0, top_p = 1.0, except for self-consistency (SC) runs (temperature = 0.7, 5 samples). Expert agents are initialized with simple, explicit identity statements (e.g., “You are a [Cardiology Expert]”), which structure reasoning pipelines without reliance on additional data modalities. There is no model adaptation, training, or learning between queries.

4. Empirical Evaluation and Benchmarking

MedAgent-Zero was evaluated on nine medical QA datasets: MedQA (USMLE), MedMCQA (AIIMS/NEET PG), PubMedQA, and six medical subtasks from MMLU (Anatomy, Clinical Knowledge, College Medicine, Medical Genetics, Professional Medicine, College Biology). For each dataset, accuracy was measured on a randomly selected sample of 300 questions, using GPT-3.5-Turbo and GPT-4 accessed through Azure OpenAI endpoints. Results are summarized below:

Model	Zero-shot	Zero-shot+CoT+SC	MedAgents
GPT-3.5	67.8%	70.9%	72.1%
GPT-4	80.6%	83.0%	86.7%

MedAgent-Zero achieved the highest average zero-shot accuracy across all datasets, outperforming not only zero-shot and CoT+SC (Chain-of-Thought with Self-Consistency) prompting, but also strong few-shot CoT baselines and Flan-PaLM.

5. Ablation Analysis and Component Contribution

Ablation studies on MedMCQA revealed the incremental effect of each modular stage within the MedAgent-Zero pipeline:

Direct prompting: 49.0% accuracy
Chain-of-Thought (CoT): 55.0% (+6.0)
+Analysis Proposition (“Anal”): 62.0% (+7.0)
+Summarization (“Summ”): 65.0% (+3.0)
+Collaborative Consultation (“Cons”): 67.0% (+2.0)

Peak performance was observed with $m=5$ question experts and $n=2$ option experts, with accuracy rising steadily as these counts increased. Error analysis (40 failures annotated by humans) attributed 77% of errors to domain-knowledge gaps (either omission or misretrieval), 8% to CoT hallucination errors, and the remainder to reasoning or consistency deviations. These results indicate that the principal performance gains are attributable to structured, role-based expert simulation and collaborative analysis (Tang et al., 2023).

6. Significance and Applicability

MedAgent-Zero demonstrates that partitioning a single LLM using role-engineered agents enables the mining and recombination of its embedded medical expertise for clinical reasoning tasks. The framework does not require model fine-tuning, external tools, or supplementary training data, which broadens its applicability to real-world, resource-constrained settings where zero-shot and training-free solutions are preferred. The methodology affords a template for extending LLM capability in other high-precision, domain-specific fields. Its training-free, multi-agent collaboration strategy provides a state-of-the-art procedural baseline for future zero-shot medical reasoning systems (Tang et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MedAgent-Zero.

MedAgent-Zero: Zero-Shot Medical Reasoning

1. Agent Roles and Overall Architecture

2. Stage-Wise Workflow and Mathematical Formulation

3. Zero-Shot Operational Principles

4. Empirical Evaluation and Benchmarking

5. Ablation Analysis and Component Contribution

6. Significance and Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MedAgent-Zero: Zero-Shot Medical Reasoning

1. Agent Roles and Overall Architecture

2. Stage-Wise Workflow and Mathematical Formulation

3. Zero-Shot Operational Principles

4. Empirical Evaluation and Benchmarking

5. Ablation Analysis and Component Contribution

6. Significance and Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research