Papers
Topics
Authors
Recent
Search
2000 character limit reached

Beyond the Individual: Virtualizing Multi-Disciplinary Reasoning for Clinical Intake via Collaborative Agents

Published 10 Apr 2026 in cs.MA | (2604.08927v1)

Abstract: The initial outpatient consultation is critical for clinical decision-making, yet it is often conducted by a single physician under time pressure, making it prone to cognitive biases and incomplete evidence capture. Although the Multi-Disciplinary Team (MDT) reduces these risks, they are costly and difficult to scale to real-time intake. We propose Aegle, a synchronous virtual MDT framework that brings MDT-level reasoning to outpatient consultations via a graph-based multi-agent architecture. Aegle formalizes the consultation state using a structured SOAP representation, separating evidence collection from diagnostic reasoning to improve traceability and bias control. An orchestrator dynamically activates specialist agents, which perform decoupled parallel reasoning and are subsequently integrated by an aggregator into a coherent clinical note. Experiments on ClinicalBench and a real-world RAPID-IPN dataset across 24 departments and 53 metrics show that Aegle consistently outperforms state-of-the-art proprietary and open-source models in documentation quality and consultation capability, while also improving final diagnosis accuracy. Our code is available at https://github.com/HovChen/Aegle.

Summary

  • The paper introduces Aegle, a framework that virtualizes multi-disciplinary reasoning via a dynamic multi-agent system to reduce diagnostic biases.
  • The paper details a structured SOAP schema and dynamic specialist activation to decouple evidence collection from diagnostic synthesis.
  • The paper demonstrates significant improvements in IDEA and SOAP metrics with over 20% diagnostic accuracy lift relative to baseline models.

Virtualizing Multi-Disciplinary Medical Reasoning: Aegle’s Multi-Agent Framework for Clinical Intake

Introduction and Motivation

Aegle directly addresses structural limitations in clinical intake workflows, where single-physician documentation frequently suffers from cognitive bias, evidence omission, and lack of reasoning depth. While Multi-Disciplinary Team (MDT) decision-making is the standard of care for complex scenarios, real-time reliance on MDT processes remains infeasible at scale. The proposed Aegle framework virtualizes MDT-level multi-perspective reasoning within outpatient consults using a graph-based multi-agent system, bridging the operational divide between the robustness of MDT and the resource constraints of real-world outpatient intake. Figure 1

Figure 1: Single-agent LLMs are prone to anchoring bias and fragmented evidence capture, while virtual MDTs (as instantiated by Aegle) support parallel specialty reasoning and coherent integration, enhancing coverage and diagnostic traceability.

Framework Architecture

Aegle’s architecture establishes formal separation between evidence collection and diagnostic synthesis by grounding its consultation state in a structured SOAP schema, St=[Ft,Pt]\mathcal{S}_t = [\mathcal{F}_t, \mathcal{P}_t]. This separation enforces traceability by constraining hypothesis generation to be downstream of evidentiary sufficiency, mitigating bias from early diagnostic fixation.

Agent roles are orchestrated via a state-aware, dynamic topology:

  • Orchestrator: Implements a dynamic specialist activation policy, engaging agents based on case-specific uncertainty and the evolving evidence matrix.
  • Specialist Agents: Execute independent, domain-constrained reasoning in parallel. Isolation delays premature consensus and supports hypothesis diversity.
  • Aggregator: Integrates state proposals with explicit write-then-speak separation; internal state updates precede patient-facing utterances, guaranteeing documentation consistency. Figure 2

    Figure 2: Diagram of the Aegle consultation pipeline, demonstrating the transition from iterative history taking and on-demand specialist activation to stabilized evidence and diagnostic synthesis.

    Figure 3

    Figure 3: History-taking phase with structured parallel inquiry and iterative note refinement, exemplified by a pediatric cardiology case.

Stagewise execution ensures that only stabilized evidence feeds diagnostic synthesis; F\mathcal{F} is frozen prior to P\mathcal{P} generation, enforcing unidirectional logical dependency.

Experimental Framework

Evaluation is robust, spanning 24 departments and 53 metrics using ClinicalBench and RAPID-IPN—datasets designed to stress both breadth and depth of diagnostic challenges:

  • ClinicalBench: Synthetic cohort with strict data-leakage controls, emphasizing generalization and open-ended clinical generation.
  • RAPID-IPN: Real-world abdominal pain cohort, annotated by senior physicians, demanding precision in differential and longitudinal documentation.

Baselines comprise proprietary/open LLMs, chain-of-thought (CoT) and tree-of-thought (ToT) reasoning strategies, and agent-based frameworks (MDAgents, MedAgents), isolating the contribution of Aegle’s architecture.

Results

Documentation Quality

Aegle achieves consistent, statistically significant improvements in both structured reasoning (IDEA) and documentation standardization (SOAP) across datasets and departments. In rigorous fixed-backbone benchmarking (DeepSeek-V3.2), Aegle’s architecture yields up to 8.6 absolute gains in IDEA and 3.5 in SOAP over the strongest agentic baselines, with improvement most concentrated in complex specialties demanding cross-domain integration. Figure 4

Figure 5: Department-wise IDEA and SOAP scores, displaying consistent superiority of Aegle across 24 clinical domains, with notable gaps in high-ambiguity specialties.

Textual metrics (READ, chrF++) are saturated across advanced LLMs, but Aegle’s architectural constraints manifest in increased evidential coherence and decreased omission, not merely stylistic fluency.

Diagnostic Accuracy

When subjected to end-to-end consultation-to-diagnosis evaluation (ClinicalBench), Aegle delivers absolute diagnostic accuracy of 46.9%—outperforming CoT, ToT, MDAgents, and MedAgents, and representing a >20% lift over matched single-model performance. This demonstrates that multi-perspective evidence gathering, when structurally coordinated, directly improves downstream clinical correctness, not just documentation fidelity.

Consultation Efficiency and Resource Utilization

Aegle’s dynamic specialist activation realizes a substantial reduction in agent invocation—less than half the per-round expert utilization of static multi-agent baselines. This enhances deployment feasibility by minimizing redundant computation without degrading reasoning quality, a nontrivial advance for real-time clinical applications.

Component Contribution and Ablation

Systematic ablation identifies structured state grounding as the dominant performance driver. Excision of the explicit F\mathcal{F}/P\mathcal{P} split leads to catastrophic degradation in both reasoning quality and evidence traceability. Removing generative inquiry inflates consultation length and induces evidence scattering, while omitting dynamic topology or decoupled reasoning uniformly depresses reasoning diversity and accuracy. Figure 5

Figure 6: Ablation analysis of Aegle components, demonstrating critical dependence on explicit state representation and generative, dynamic inquiry mechanisms.

Validation of LLM-as-a-Judge

Quantitative correlation with practicing physician review (Pearson r > 0.66 across all metrics) validates the use of LLM-based rubric scoring for scalable, multi-dimensional evaluation—supporting the robustness of the presented results. Figure 6

Figure 7: Correlation matrix between LLM-judge and human ratings on IDEA, SOAP, and readability metrics, confirming rubric alignment.

Case Study Analysis

Aegle preserves clinically actionable low-salience features (e.g., IPSS score, granular pathology details) in high-complexity cases, resulting in more nuanced risk stratification and contextually appropriate diagnostic/management plans compared to CoT/ToT and prior multi-agent systems. The decoupled MDT emulation enables integration of weak signals that are frequently eclipsed by dominant clinical findings in single-agent models. Figure 8

Figure 9: Example IPN progression generated by Aegle in a complex prostate cancer case, capturing granular evidence missed by baseline models.

Figure 10

Figure 11: Continuation of the case, demonstrating the plan’s coherence and evidentiary traceability up to specialist referral.

Theoretical Implications

Aegle’s framework formalizes essential best practices in collaborative medicine within a tractable, synchronous agentic system. By enforcing explicit evidence-diagnosis separation and dynamic role assignment, it bridges cognitive science insights with computational tractability. The architecture offers a reference standard for virtualized collaborative reasoning in high-variance, context-dependent AI decision tasks.

Practical and Future Directions

Practically, Aegle establishes a route toward scalable, bias-minimized AI-based clinical documentation. Its dynamic topology supports adaptive resource allocation, which is critical for real-world deployment under latency and cost constraints. The framework’s structure is directly extendable: targeted improvements in aggregation, redundancy penalization, and diversity-aware specialist prompting are immediate frontiers for optimizing inter-agent synthesis.

Future research should systematically balance hypothesis diversity (de-biasing) against aggregation redundancy and explore fine-grained, audit-friendly traceability—especially for regulatory and medico-legal downstream applications. Extending the architecture to support richer temporal reasoning and referral chain handoff would further align virtual AI systems with full-cycle MDT clinical workflows.

Conclusion

Aegle demonstrates that virtual MDT architectures, grounded in structured state evolution and controlled multi-agent specialization, significantly surpass both single-LLM and existing multi-agent models in clinical documentation and diagnostic performance. These findings have direct translational relevance for robust, scalable AI decision support and establish foundational principles for the next generation of agentic healthcare intelligence systems.


Reference: "Beyond the Individual: Virtualizing Multi-Disciplinary Reasoning for Clinical Intake via Collaborative Agents" (2604.08927)

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.