Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 33 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 85 tok/s
GPT OSS 120B 468 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making (2508.05996v1)

Published 8 Aug 2025 in cs.AI

Abstract: Complex medical decision-making involves cooperative workflows operated by different clinicians. Designing AI multi-agent systems can expedite and augment human-level clinical decision-making. Existing multi-agent researches primarily focus on language-only tasks, yet their extension to multimodal scenarios remains challenging. A blind combination of diverse vision-LLMs (VLMs) can amplify an erroneous outcome interpretation. VLMs in general are less capable in instruction following and importantly self-reflection, compared to LLMs of comparable sizes. This disparity largely constrains VLMs' ability in cooperative workflows. In this study, we propose MedOrch, a mediator-guided multi-agent collaboration framework for medical multimodal decision-making. MedOrch employs an LLM-based mediator agent that enables multiple VLM-based expert agents to exchange and reflect on their outputs towards collaboration. We utilize multiple open-source general-purpose and domain-specific VLMs instead of costly GPT-series models, revealing the strength of heterogeneous models. We show that the collaboration within distinct VLM-based agents can surpass the capabilities of any individual agent. We validate our approach on five medical vision question answering benchmarks, demonstrating superior collaboration performance without model training. Our findings underscore the value of mediator-guided multi-agent collaboration in advancing medical multimodal intelligence. Our code will be made publicly available.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces MedOrch, a mediator-guided multi-agent framework that synthesizes responses from open-source VLMs and LLMs to enhance medical decision accuracy.
  • It employs a mediator agent to drive Socratic dialogues among expert agents, resolving discrepancies and refining clinical Q&A outcomes.
  • Experimental results on five medical VQA benchmarks demonstrate that MedOrch outperforms single-agent and alternative multi-agent approaches in accuracy.

Mediator-Guided Multi-Agent Collaboration Framework for Medical Decision-Making

The paper "Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making" (2508.05996) introduces MedOrch, a mediator-guided multi-agent collaboration framework designed to facilitate multimodal decision-making in the medical domain. This system leverages a combination of vision-LLMs (VLMs) and LLMs to improve clinical decision-making processes. This essay provides a detailed summary of the methodological framework and experimental results, aimed at demonstrating the utility and effectiveness of MedOrch in handling complex medical question answering (Q&A) tasks.

Framework Overview

MedOrch comprises three key components: expert agents, a mediator agent, and a judge agent. The expert agents are open-source VLMs designed to handle specific medical tasks, while the mediator agent is a specialized LLM responsible for synthesizing information and fostering dialogue among expert agents through Socratic questioning. This is followed by the final judgment conducted by the judge agent, which assesses the comprehensive discussions and provides a conclusive decision. Figure 1

Figure 1: The mediator-guided multi-agent collaboration framework for medical VQA. In the initial stage, multiple VLM expert agents independently generate preliminary answers to a given question. To promote deeper interaction, the mediator agent synthesizes the information and formulates Socratic questions for expert agents. Subsequently, the relevant expert agents reflect on the question and generate refined responses accordingly. Finally, a judge agent analyzes the dialogues between the mediator and expert agents and achieves a systematic output.

Mediator-Guided Collaboration Strategy

The core functionality of MedOrch lies in its ability to manage information exchange between heterogeneous agents. Upon receiving a medical question, multiple expert agents offer initial opinions. The mediator agent identifies consensus and disagreements among these responses and embarks on additional inquiries with specific agents when discrepancies arise. This interaction is guided by Socratic dialogues, which aim for clarification and resolution of misunderstandings.

Judgment Procedure

The final judgment phase integrates all feedback and discussions facilitated by the mediator agent to infer the most accurate medical decision. By utilizing a judge agent with specialized capabilities in cognitive and diagnostic assessment, MedOrch ensures a rigorous compilation of diverse expert opinions.

Experimental Evaluation

MedOrch was evaluated using five medical VQA benchmarks: VQA-RAD, SLAKE, PathVQA, PMC-VQA, and OmniMedVQA. Each dataset provided a platform to compare the performance of MedOrch against both single-agent and alternative multi-agent frameworks. Figure 2

Figure 2

Figure 2: Comparison with GPT-4V and other multi-agent methods based on GPT-4V on PathVQA dataset.

Results and Performance

The empirical results indicate that MedOrch sustains a competitive edge by surpassing single-agent models and existing multi-agent approaches in terms of accuracy across tested benchmarks. Notably, the system was able to harness joint abilities from diverse VLMs to achieve superior performance without incurring prohibitive costs associated with proprietary models like those in the GPT series.

Discussion and Implications

MedOrch demonstrates how a thoughtfully designed multi-agent framework can enhance the capacity for medical decision-making by capitalizing on the collective intelligence of multiple agents. The inclusion of diverse open-source models and a mediator-led discussion protocol proves effective in addressing typical challenges inherent in clinical scenario interpretations.

Concurrently, certain limitations persist, primarily the latency introduced by complex agent interactions, which could be mitigated through further optimization of model interactions and resource allocation. Moreover, integration of real-world data and clinical knowledge databases could potentiate even greater accuracies and diagnostic specificity.

Conclusion

The introduction of MedOrch represents a significant advancement in AI-driven medical decision-making frameworks by exemplifying the benefits of multi-agent collaboration. By demonstrating efficacy across multiple medical Q&A benchmarks, this approach substantiates the potential of integrating and orchestrating heterogeneous AI agents within clinical workflows. The proposed framework embodies a paradigm shift towards enhancing clinical decision accuracy, especially when integrating multifaceted data points inherent in medical environments. Future research could further explore the scalability and adaptability of such systems across various medical domains and real-world applications.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube