Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

144 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

MAI-DxO: Multi-Agent Diagnostic Orchestrator

Updated 1 July 2025

MAI-DxO is a model-agnostic, multi-agent diagnostic orchestration framework that simulates a panel of virtual specialists for evidence-based decision making.
It employs iterative Bayesian updating and value-driven test selection to optimize diagnostic accuracy while controlling costs in complex clinical environments.
Evaluations show that MAI-DxO outperforms generalist physicians and traditional AI models, achieving up to 85.5% accuracy with significant cost reductions.

The Multiple Access Interference Diagnostic Orchestrator (MAI-DxO) is a model-agnostic, multi-agent orchestration framework that emulates iterative, evidence-driven clinical reasoning for the purpose of diagnostic support and cost-effective test selection in complex, real-world settings. Originating in the context of sequential diagnosis with advanced LLMs, MAI-DxO advances the field by simulating a panel of virtual specialists, each embodying distinct medical reasoning strategies, to collaborate on differential diagnosis formation, test selection, and clinical decision-making (2506.22405).

1. System Architecture and Virtual Physician Panel

MAI-DxO operates as a layered orchestration platform atop state-of-the-art LLMs, capable of guiding diagnosis through iterative, panel-based reasoning. The orchestration centers on a single LLM instructed to play five specialized "doctor" roles:

Dr. Hypothesis: Maintains and continually updates a ranked differential diagnosis list using a Bayesian updating process as new information is acquired.
Dr. Test-Chooser: Strategically selects diagnostic tests at each iteration to maximize information gain and discriminatory value relative to current diagnostic uncertainty.
Dr. Challenger: Acts as a devil’s advocate, identifying anchoring bias and proposing tests to potentially falsify leading hypotheses, thus guarding against premature closure.
Dr. Stewardship: Focuses on cost-effective care, suggesting economical yet informative alternatives and vetoing low-yield or expensive tests if unwarranted.
Dr. Checklist: Silently ensures medical validity and logical consistency, acting as a quality-control mechanism over all actions taken by the panel.

At each diagnostic step, the virtual panel deliberates and decides among querying for additional history/physical exam data, ordering new diagnostic tests, or committing to a diagnosis if sufficient certainty is reached. An additional "budget tracker" module can be activated to manage test costs, prompting cancellation of tests if the cumulative expense surpasses predefined constraints.

2. Iterative Differential Diagnosis and Bayesian Updating

MAI-DxO employs an explicit, iterative process of diagnostic reasoning. Central to this is the maintenance and continuous update of disease probability estimates in the evolving differential diagnosis list. The Bayesian updating rule, utilized by Dr. Hypothesis at each iteration, is defined by:

$P(D|E) = \frac{P(E|D)P(D)}{P(E)}$

where:

$P(D|E)$ is the posterior probability of disease $D$ given evidence $E$ ,
$P(E|D)$ is the likelihood of observing evidence $E$ if $D$ is present,
$P(D)$ is the prior probability of $D$ ,
$P(E)$ is the marginal probability of $E$ .

This formalism enables the orchestrator to dynamically re-rank hypotheses as new findings or test results are integrated, closely mirroring physician diagnostic behavior in clinical practice.

3. High-Value, Cost-Efficient Test Selection

A key feature of MAI-DxO is its test selection policy, which is designed to optimize diagnostic certainty per unit cost. Dr. Test-Chooser evaluates possible tests based on their projected value-of-information, calculated as:

$\text{Value}(Test) = \frac{\Delta \text{Diagnostic Certainty}}{\text{Cost of Test}}$

where $\Delta \text{Diagnostic Certainty}$ denotes the anticipated decrease in uncertainty or ambiguity within the current differential diagnosis. Dr. Stewardship subsequently reviews suggested tests, vetoing those that are high-cost or duplicative unless they offer uniquely high discriminatory value.

The orchestrator supports multiple operational modes:

Budgeted mode: Real-time cost tracking informs the cancellation of tests to adhere to predefined financial constraints.
No-budget mode: Full diagnostic liberty is permitted for maximum accuracy.
Question-only mode: Restricts actions to non-invasive data gathering.
Ensemble mode: Multiple independent MAI-DxO panels run in parallel, with consensus strategies for diagnosis.

4. Evaluation and Comparative Performance

MAI-DxO's performance was evaluated on the Sequential Diagnosis Benchmark (SDBench), comprising 304 NEJM Clinicopathological Conference cases restructured into stepwise encounters. Assessment metrics include both diagnostic accuracy (clinically validated rubric) and average monetary cost of tests and visits.

Results demonstrate that MAI-DxO, when paired with the OpenAI o3 model, achieves diagnostic accuracy of 81.9% (no budget mode), 85.5% (ensemble mode), and 79.9% (cost-constrained mode), significantly surpassing the generalist physician benchmark (19.9%) and off-the-shelf AI models (78.6%), while simultaneously reducing cost by up to 70% versus unfiltered baseline models and approximately 20% compared to physicians. Performance improvements generalized across model providers—including Gemini, Claude, Grok, DeepSeek, and Llama families—and persisted in evaluations on NEJM cases published after model cutoff dates, attesting to the approach's robustness and lack of overfitting.

Agent / Model	Accuracy (%)	Avg Cost per case (\$)
US/UK Generalist Physicians	19.9	2,963
Off-the-shelf o3 LM	78.6	7,850
MAI-DxO (no budget, o3)	81.9	4,735
MAI-DxO (budget, o3)	79.9	2,396
MAI-DxO (ensemble, o3)	85.5	7,184

In all operational modes, MAI-DxO established new Pareto-optimal tradeoffs between diagnostic accuracy and cost, outperforming both human and machine competitors (2506.22405).

5. Contributions to Clinical Reasoning and AI Diagnostic Support

MAI-DxO demonstrates that explicitly structured, role-based orchestration atop LLMs enables performance gains not attainable by monolithic or single-agent approaches. By simulating multidisciplinary clinical best practices—including bias detection, cost stewardship, adversarial testing, and probabilistic updating—MAI-DxO achieves substantial improvements in both diagnostic accuracy and efficiency. Importantly, the orchestrator's design ensures that each diagnostic action is justified based on projected information value and cost, and diagnostic decisions are robust to common reasoning pitfalls such as anchoring or premature closure.

Generalizability and model-agnosticity are core architectural features; the orchestration logic can scaffold performance of diverse underlying LLMs, broadening diagnostic capabilities even in resource-constrained environments.

6. Practical Implications and Prospective Applications

MAI-DxO represents a paradigm shift for clinical AI, moving from static, case-based evaluation to emulation of real-world, sequential medical problem-solving. Its orchestration framework, role differentiation, and value-driven test selection provide a template for integration into clinical decision-support systems, enabling scalable and equitable diagnostics with quantifiable improvements in both medical quality and resource allocation. Applications extend to settings requiring optimized diagnostic throughput, cost-sensitive environments, and scenarios where iterative, transparent diagnostic reasoning is essential.

In summary, MAI-DxO achieves superhuman diagnostic performance and efficiency through the explicit orchestration of evidence-based, cost-aware, and bias-resistant reasoning strategies, setting a benchmark for next-generation diagnostic AI.

PDF Markdown Chat (Upgrade)

References (1)

Sequential Diagnosis with Language Models (2025)