MAI-DxO: Multi-Agent Diagnostic Orchestrator
- MAI-DxO is a model-agnostic, multi-agent diagnostic orchestration framework that simulates a panel of virtual specialists for evidence-based decision making.
- It employs iterative Bayesian updating and value-driven test selection to optimize diagnostic accuracy while controlling costs in complex clinical environments.
- Evaluations show that MAI-DxO outperforms generalist physicians and traditional AI models, achieving up to 85.5% accuracy with significant cost reductions.
The Multiple Access Interference Diagnostic Orchestrator (MAI-DxO) is a model-agnostic, multi-agent orchestration framework that emulates iterative, evidence-driven clinical reasoning for the purpose of diagnostic support and cost-effective test selection in complex, real-world settings. Originating in the context of sequential diagnosis with advanced LLMs, MAI-DxO advances the field by simulating a panel of virtual specialists, each embodying distinct medical reasoning strategies, to collaborate on differential diagnosis formation, test selection, and clinical decision-making (2506.22405).
1. System Architecture and Virtual Physician Panel
MAI-DxO operates as a layered orchestration platform atop state-of-the-art LLMs, capable of guiding diagnosis through iterative, panel-based reasoning. The orchestration centers on a single LLM instructed to play five specialized "doctor" roles:
- Dr. Hypothesis: Maintains and continually updates a ranked differential diagnosis list using a Bayesian updating process as new information is acquired.
- Dr. Test-Chooser: Strategically selects diagnostic tests at each iteration to maximize information gain and discriminatory value relative to current diagnostic uncertainty.
- Dr. Challenger: Acts as a devil’s advocate, identifying anchoring bias and proposing tests to potentially falsify leading hypotheses, thus guarding against premature closure.
- Dr. Stewardship: Focuses on cost-effective care, suggesting economical yet informative alternatives and vetoing low-yield or expensive tests if unwarranted.
- Dr. Checklist: Silently ensures medical validity and logical consistency, acting as a quality-control mechanism over all actions taken by the panel.
At each diagnostic step, the virtual panel deliberates and decides among querying for additional history/physical exam data, ordering new diagnostic tests, or committing to a diagnosis if sufficient certainty is reached. An additional "budget tracker" module can be activated to manage test costs, prompting cancellation of tests if the cumulative expense surpasses predefined constraints.
2. Iterative Differential Diagnosis and Bayesian Updating
MAI-DxO employs an explicit, iterative process of diagnostic reasoning. Central to this is the maintenance and continuous update of disease probability estimates in the evolving differential diagnosis list. The Bayesian updating rule, utilized by Dr. Hypothesis at each iteration, is defined by:
where:
- is the posterior probability of disease given evidence ,
- is the likelihood of observing evidence if is present,
- is the prior probability of ,
- is the marginal probability of .
This formalism enables the orchestrator to dynamically re-rank hypotheses as new findings or test results are integrated, closely mirroring physician diagnostic behavior in clinical practice.
3. High-Value, Cost-Efficient Test Selection
A key feature of MAI-DxO is its test selection policy, which is designed to optimize diagnostic certainty per unit cost. Dr. Test-Chooser evaluates possible tests based on their projected value-of-information, calculated as:
where denotes the anticipated decrease in uncertainty or ambiguity within the current differential diagnosis. Dr. Stewardship subsequently reviews suggested tests, vetoing those that are high-cost or duplicative unless they offer uniquely high discriminatory value.
The orchestrator supports multiple operational modes:
- Budgeted mode: Real-time cost tracking informs the cancellation of tests to adhere to predefined financial constraints.
- No-budget mode: Full diagnostic liberty is permitted for maximum accuracy.
- Question-only mode: Restricts actions to non-invasive data gathering.
- Ensemble mode: Multiple independent MAI-DxO panels run in parallel, with consensus strategies for diagnosis.
4. Evaluation and Comparative Performance
MAI-DxO's performance was evaluated on the Sequential Diagnosis Benchmark (SDBench), comprising 304 NEJM Clinicopathological Conference cases restructured into stepwise encounters. Assessment metrics include both diagnostic accuracy (clinically validated rubric) and average monetary cost of tests and visits.
Results demonstrate that MAI-DxO, when paired with the OpenAI o3 model, achieves diagnostic accuracy of 81.9% (no budget mode), 85.5% (ensemble mode), and 79.9% (cost-constrained mode), significantly surpassing the generalist physician benchmark (19.9%) and off-the-shelf AI models (78.6%), while simultaneously reducing cost by up to 70% versus unfiltered baseline models and approximately 20% compared to physicians. Performance improvements generalized across model providers—including Gemini, Claude, Grok, DeepSeek, and Llama families—and persisted in evaluations on NEJM cases published after model cutoff dates, attesting to the approach's robustness and lack of overfitting.
Agent / Model | Accuracy (%) | Avg Cost per case (\$) |
---|---|---|
US/UK Generalist Physicians | 19.9 | 2,963 |
Off-the-shelf o3 LM | 78.6 | 7,850 |
MAI-DxO (no budget, o3) | 81.9 | 4,735 |
MAI-DxO (budget, o3) | 79.9 | 2,396 |
MAI-DxO (ensemble, o3) | 85.5 | 7,184 |
In all operational modes, MAI-DxO established new Pareto-optimal tradeoffs between diagnostic accuracy and cost, outperforming both human and machine competitors (2506.22405).
5. Contributions to Clinical Reasoning and AI Diagnostic Support
MAI-DxO demonstrates that explicitly structured, role-based orchestration atop LLMs enables performance gains not attainable by monolithic or single-agent approaches. By simulating multidisciplinary clinical best practices—including bias detection, cost stewardship, adversarial testing, and probabilistic updating—MAI-DxO achieves substantial improvements in both diagnostic accuracy and efficiency. Importantly, the orchestrator's design ensures that each diagnostic action is justified based on projected information value and cost, and diagnostic decisions are robust to common reasoning pitfalls such as anchoring or premature closure.
Generalizability and model-agnosticity are core architectural features; the orchestration logic can scaffold performance of diverse underlying LLMs, broadening diagnostic capabilities even in resource-constrained environments.
6. Practical Implications and Prospective Applications
MAI-DxO represents a paradigm shift for clinical AI, moving from static, case-based evaluation to emulation of real-world, sequential medical problem-solving. Its orchestration framework, role differentiation, and value-driven test selection provide a template for integration into clinical decision-support systems, enabling scalable and equitable diagnostics with quantifiable improvements in both medical quality and resource allocation. Applications extend to settings requiring optimized diagnostic throughput, cost-sensitive environments, and scenarios where iterative, transparent diagnostic reasoning is essential.
In summary, MAI-DxO achieves superhuman diagnostic performance and efficiency through the explicit orchestration of evidence-based, cost-aware, and bias-resistant reasoning strategies, setting a benchmark for next-generation diagnostic AI.