Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 169 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 20 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 87 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Doctor-R1: AI Clinical Inquiry Agent

Updated 12 October 2025
  • Doctor-R1 is an AI clinical inquiry agent that uses a multi-agent interactive environment to simulate realistic patient consultations.
  • It employs a two-tiered reward architecture to optimize both empathetic communication and accurate diagnostic decision-making.
  • The system leverages an experience-driven learning pipeline and strong evaluation benchmarks to outperform larger models in clinical dialogue.

Doctor-R1 is an AI doctor agent designed to master professional clinical inquiry by simultaneously optimizing accurate medical decision-making and strategic, empathetic multi-turn patient consultation. The system is architected around a multi-agent interactive environment, a dual-tier reward structure, and an experience-driven learning regimen, and achieves strong state-of-the-art performance and human preference on key clinical dialogue benchmarks.

1. Multi-Agent Interactive Clinical Environment

Doctor-R1 operates within a simulated multi-agent environment that mirrors a realistic outpatient consultation. The doctor agent, implemented as a policy model, interacts dynamically with a simulated patient agent, with the overall clinical exchange formalized as a Partially Observable Markov Decision Process (POMDP). A dedicated Consultation Evaluator agent monitors the interaction and supplies turn-wise and episode-level feedback. This environment captures the full temporal structure and partial observability of true clinical consultations, enforcing the need for the agent to gather information across multiple turns and adapt to encountered uncertainty and patient responses.

2. Two-Tiered Reward Architecture

A core innovation in Doctor-R1 is the separation of optimization objectives via a two-tiered reward system:

  • Process Rewards: At each dialogue turn, the agent receives a “process” reward scoring communication and inquiry skills along multiple axes, including safety, logical reasoning, medical accuracy, completeness, quality of information gathering, faithfulness, empathy, and humility. This feedback enables learning of soft skills and strategic questioning, not just recall or factual correctness.
  • Outcome Rewards: Upon dialogue completion, a distinct “outcome” reward evaluates the correctness and completeness of the final diagnostic decision against gold-standard ground truth (with rewards of 1.0 for correct, 0.5 for partially correct, and 0 for incorrect). This decoupling ensures that the agent not only delivers correct diagnoses but learns to arrive at them via detailed, patient-centered inquiry.

This architecture allows Doctor-R1 to simultaneously optimize communicative competence and diagnostic accuracy, properties that are often only loosely coupled in existing LLM-based systems.

3. Experience Repository and Learning Pipeline

Doctor-R1 grounds its policy learning in a dynamically curated experience repository that stores high-quality prior consultation trajectories. The experience retrieval pipeline operates in multiple stages:

  • Stage 1: Semantic Retrieval Dense embedding models compute the cosine similarity between the current state/trajectory and all previously stored experiences, weighting for both similarity and past trajectory reward.
  • Stage 2: Reranking Retrieved candidates are re-ranked using a cross-encoder reranker model that attends to token-level matches and context, improving retrieval precision for complex cases.
  • Stage 3: Novelty and Reward Filtering Only novel and high-reward (i.e., high-quality) trajectories are retained for subsequent learning steps, preventing the agent from overfitting to suboptimal strategies and ensuring continual policy improvement.

This repository enables experience replay and retrieval-augmented policy updates, supporting more rapid convergence to high-quality inquiry policies and robust strategic adaptation to rare or challenging cases.

4. Evaluation Benchmarks and Metrics

Doctor-R1 is assessed using two high-fidelity clinical dialogue evaluation suites:

  • HealthBench Evaluates across Themes (e.g., emergency, health data, communication, global health, hedging, context seeking, complex response) and Axes (e.g., factual accuracy, instruction following, communication quality, context awareness, completeness).
  • MAQuE Emphasizes multi-faceted qualities: task success (accuracy, robustness), inquiry proficiency (coverage, relevance), dialogue competence (adherence, coherence), and patient experience (clarity, empathy).

Across these metrics, Doctor-R1’s scores are reported as substantially improved relative to both open-source and proprietary strong baselines, including UltraMedical-70B and models with significantly higher parameter counts.

Benchmark UltraMedical-70B Doctor-R1 (Avg.) Delta
HealthBench 26.38 36.29 +9.91
MAQuE (Accuracy) 52.00 60.00 +8.00

This tabular comparison documents Doctor-R1’s strong relative improvements.

5. Human-Centric Evaluation and Preferred Dialogue Quality

In addition to automated metrics, Doctor-R1 underwent human preference testing via pairwise dialogue comparisons. Annotators assessed model outputs on coherence, adherence to clinical role, clarity, and empathy. Doctor-R1 was consistently preferred, with particular strengths cited in natural, human-like empathy and the ability to structure communication in a way that supported patient understanding and addressed risk checks without resorting to formulaic or rigid scripting.

6. Implications for Clinical Practice

Doctor-R1 addresses key shortcomings in prior LLM-based doctor agents by combining dynamic, multi-turn interactive inquiry with robust medical decision-making. Its dual focus on process and outcome rewards enables adaptive, safety-conscious decision policies and patient-centered communication—essential for real-world deployment in outpatient or triage settings. The high parameter efficiency (an 8B model outperforming 32B/70B baselines) suggests potential for scalable, cost-effective deployment without sacrificing quality. This architecture also supports rapid adaptation to new medical contexts by updating the experience repository or refining the reward schemas.

A plausible implication is that frameworks similar to Doctor-R1 could serve as front-line assistants in clinical intake or telehealth, supporting clinicians by pre-gathering high-yield information and triaging cases more safely and thoroughly than script-based or decision-tree systems.

7. Summary and Outlook

Doctor-R1 establishes a new state-of-the-art in LLM-based clinical inquiry by integrating a realistic multi-agent environment, a dual reward structure separating communication and diagnostic objectives, and an experiential learning pipeline for strategic inquiry refinement. Its performance on benchmarks and human evaluations demonstrates that the agent outperforms leading open-source and powerful proprietary models while using fewer parameters. These findings highlight the value of agentic reinforcement learning and experience-driven retrieval in mastering the nuances of professional doctor–patient interaction, setting a foundation for further advances in autonomous, human-preferred AI clinical agents (Lai et al., 5 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Doctor-R1.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube