Mixture-of-Agents: Coordinating Diverse AI Agents

Updated 13 October 2025

Mixture-of-agents are multi-agent systems that pool diverse, specialized agents using dynamic aggregation to collaboratively optimize complex tasks.
They implement adaptive routing, sparse activation, and residual compensation to balance diversity with individual agent quality.
Applications span from language model ensembling to autonomous navigation, improving performance and interpretability across various domains.

A mixture-of-agents (MoA) system is any multi-agent architecture in which individually autonomous or specialized agents contribute to, and often collaboratively optimize, solutions to complex tasks. In the context of contemporary machine learning, control, and distributed optimization, MoA frameworks deliberately leverage agent diversity—whether in policy, model, modality, or domain expertise—to enhance robustness, scalability, interpretability, or performance in environments where single-agent or monolithic models are inherently limited. MoA research encompasses both algorithmic mechanisms for coordinating heterogeneous agents (including agent routing, consensus, or adaptive replacement) and theoretical analyses of the collective properties that emerge from agent mixtures.

1. Foundational Principles and Formalizations

Mixture-of-agents systems formalize the combination of agent policies, outputs, or features, often through explicit aggregation or dynamic voting mechanisms. In RL and decision-making, MoA generalizes the “mixture-of-experts” paradigm by replacing fixed model ensembles with autonomous agents (potentially with their own objectives or information sets). For example, universal agent mixture theory (Alexander et al., 2023) defines a weighted linear mixture of agents $\vec{\pi}$ with weight vector $\vec{w}$ , such that the mixture policy is:

$(\vec{w} \cdot \vec{\pi})(y|h) = \frac{\vec{w} \cdot \mathbb{P}^{\vec{\pi}}(hy)}{\vec{w} \cdot \mathbb{P}^{\vec{\pi}}(h)}$

where $\mathbb{P}^{\vec{\pi}}(h)$ are the probabilities each agent assigns to history $h$ . The expected total reward in any environment is the corresponding weighted average:

$V^{\vec{w}\cdot\vec{\pi}}_\mu = \vec{w} \cdot V^{\vec{\pi}}_\mu$

Game-theoretic studies have extended this analysis to multi-action, multi-type settings—distinguishing, for instance, between coordinating and anti-coordinating agents (Vanelli et al., 2019). In such settings, Nash equilibria may or may not exist and can be algorithmically characterized only by analyzing the interplay of threshold distributions and best-response correspondences.

2. Mixture-of-Agents in Model Ensembling and Collaboration

Recent research has applied the mixture-of-agents methodology to ensemble systems of LLMs and other generative models, moving beyond traditional soft voting or bagging. As exemplified in (Wang et al., 7 Jun 2024), MoA systems for LLMs are often designed as layered architectures:

Each layer contains multiple agent models, each of which receives the ensemble output of the previous layer as part of its input.
An aggregation (“aggregate-and-synthesize”) mechanism fuses agent outputs before forwarding them as auxiliary knowledge to agents in the next layer.
This enables the system to progressively refine responses, leveraging complementary perspectives.

Formally, the process in each layer may be represented as:

$y_i = \bigoplus_{j=1}^n A_{i,j}(x_i) + x_1$

$x_{i+1} = y_i$

Key properties of such architectures include improved correctness, factuality, and robustness on benchmarks like AlpacaEval 2.0, often outperforming even state-of-the-art proprietary LLMs. This approach generalizes to domain-specific applications (e.g., healthcare summarization (Jang et al., 4 Apr 2025), industrial code optimization (Ashiga et al., 5 Aug 2025), and retrieval-augmented generation in financial research (Chen et al., 4 Sep 2024)) by orchestrating small, specialized LLMs and aggregators, sometimes planning and routing subtasks with a high-level planner agent.

A critical innovation has been the introduction of role diversification and dynamic response filtering (see SMoA, (Li et al., 5 Nov 2024)) utilizing mechanisms such as:

Top- $k$ response selection by a “Judge” agent:

$P_{i,1}'(x_i),...,P_{i,k}'(x_i) = J \Big( \bigoplus_{j=1}^n [P_{i,j}(x_i)] \Big)$

Early stopping by a moderator agent, which allows variable-depth inference conditioned on consensus.

3. Sparse, Residual, Dynamic, and Distributed Extensions

To address the high inference cost and token explosion inherent to fully connected MoA systems, several sparsification and efficiency strategies have been developed:

Sparse Mixture-of-Agents (SMoA) (Li et al., 5 Nov 2024) activates only a subset of agents or responses per layer, guided by “Expert Diversity” role assignment and dynamic pruning.
Residual MoA (RMoA) (Xie et al., 30 May 2025) introduces residual connections and embedding-based diversity selection. The residual extraction agent computes differential information across layers:

$R_\ell = \text{Cat}(\{R_{\ell,x}\}, \{R_{\ell-1,x}\})$

$\Delta R_\ell = \text{Res}(R_\ell, \text{prompt})$

with an adaptive termination criterion based on residual convergence.

Dynamic Mixture-of-Experts (DMoE) (Kong et al., 21 Sep 2025) generates dynamic convolutional kernels per agent in collaborative perception systems, combining local and fused features, with a gating mechanism and a diversity-enforcing triplet loss.

Distributed settings, such as edge-based collaborative inference (Mitra et al., 30 Dec 2024), rely on decentralized gossip protocols where agents on individual devices share and aggregate prompt responses, subject to queuing stability constraints:

$((k+1)M + 1)\lambda < 1/\alpha$

where $\lambda$ is the prompt arrival rate, $M$ the number of layers, $k$ the number of parallel proposers, and $\alpha$ the average inference time.

4. Optimization, Adaptation, and Game-Theoretic Coordination

Optimization in MoA settings may target:

Minimizing system-wide objective functions. In traffic networks, for instance, the minimal fraction of “compliant” (centrally routed) agents needed for system-optimal routing is determinable by a linear program encoding demand, flow conservation, link capacity, and reduced cost path constraints (Sharon et al., 2017):

$\begin{align*} \max & \sum_{s,t} r^*_{(s,t)} \ \text{subject to: } & r^*_{(s,t)} \leq R(s,t) \ & \sum_t r^*_{(s,t)} = \sum_{e\in\text{out}(s)} x_e^s \ & \sum_s x_e^s \leq \bar{f}_e^\text{SO} \ & x_e^s = 0, \text{ if } e\notin E_\text{RC}^s \end{align*}$

Reward-based agent pruning and replacement: The RLFA algorithm (Liu, 29 Jan 2025) detects underperforming agents (e.g., in fraud detection) using episodic accuracy and reward tracking, evicts persistently suboptimal agents, and provisions new candidates through a probationary “shadow” phase.
Federated learning among black-box agents. In (Yang et al., 30 Apr 2025), agents are treated as game-theoretic competitors where only their output predictions (not internal models) are observable. Nash equilibria for the linear combination of agent readouts are solved via recursions involving dynamic Riccati-like equations and server-level mixture weight optimization:

$w_t^* = A^{-1}\left(b - \frac{\mathbf{1}_N^T A^{-1} b - \eta}{\mathbf{1}_N^T A^{-1} \mathbf{1}_N} \mathbf{1}_N \right)$

with agent parameter updates given in feedback form.

Token-level controlled decoding and alignment: Collab (Chakraborty et al., 27 Mar 2025) switches agent policies dynamically during sequence generation, maximizing a KL-regularized Q-function at each token:

$\pi_{\text{alg}} \in \arg\max_z \max_{\pi_j \in \Pi} \left[ Q_{\text{target}}^{(\pi_j)}(s_t, z) - \alpha \mathrm{KL}(\pi_j(\cdot|s_t) \parallel \pi_\text{ref}(\cdot|s_t)) \right]$

5. Evaluation, Empirical Findings, and Quality–Diversity Trade-offs

Empirical benchmarks demonstrate that MoA systems can substantially improve solution quality, coverage, and robustness:

LLMs: Multi-layer MoA systems achieve higher scores than single LLMs on AlpacaEval 2.0, MT-Bench, and FLASK, with improvements not only in correctness and insightfulness but also in specialized tasks such as mathematical reasoning (Wang et al., 7 Jun 2024), healthcare summarization (Jang et al., 4 Apr 2025), and industrial code optimization (Ashiga et al., 5 Aug 2025).
Resource-constrained deployments: MoA can be tuned to trade off cost and accuracy (Chen et al., 4 Sep 2024), with parallelization mitigating latency overhead.
Quality vs. diversity: In LLM ensembling, evidence suggests that aggregating outputs from a single high-quality model (Self-MoA) outperforms mixing multiple, heterogeneous models unless individual model specialties are well aligned with task heterogeneity (Li et al., 2 Feb 2025). The trade-off can be expressed as:

$t = \alpha q + \beta d + \gamma$

where $t$ is the final performance, $q$ is quality, $d$ is diversity (e.g., Vendi Score), and $\alpha \gg \beta$ in practical settings.

Safety, fairness, and alignment: MoA frameworks enable effective filtering, role assignment, or reflective agreement protocols for safe dialogue, robust event extraction, and consensus in ambiguous or sensitive domains (Li et al., 5 Nov 2024, Haji et al., 26 Aug 2025, Chakraborty et al., 27 Mar 2025).

6. Applications Beyond Language and Reasoning

Mixture-of-agents frameworks extend to diverse domains:

Collaborative perception: Heterogeneity-aware DMoE modules enable simultaneous exploitation of shared and agent-specific cues in multi-agent perception, improving segmentation and detection performance in autonomous driving settings (Kong et al., 21 Sep 2025).
Multimodal clinical prediction: Mixture-of-multimodal-agents (MoMA) architectures sequentially process EHR modalities, with specialist text and image agents contributing structured summaries before aggregation and prediction. This design both improves accuracy and supports modular extension for new modalities (Gao et al., 7 Aug 2025).
Skill-based navigation: Decomposition into atomic skills (SkillNav) facilitates modular navigation reasoning, with a VLM-based router invoking the optimal skill agent per timestep to generalize to novel environments (Ma et al., 11 Aug 2025).
Video restoration: MoA-VR composes vision-language degradation identification, LLM-based restoration planning, and VLM-based video quality assessment agents in a closed-loop, modular reasoning framework, outscoring traditional restoration models, especially for compound degradations (Liu et al., 9 Oct 2025).

7. Limitations, Open Problems, and Future Directions

Current MoA research highlights computational cost, information loss through aggregation, and the complexity of robustly coordinating heterogeneous agents as significant challenges. Solutions such as sparse activation (Li et al., 5 Nov 2024), residual compensation (Xie et al., 30 May 2025), and adaptive or learning-based agent selection (Liu, 29 Jan 2025, Chakraborty et al., 27 Mar 2025) are active areas of development.

Open questions persist regarding optimal trade-off strategies between diversity and individual agent quality, especially in settings of high agent specialization or task heterogeneity (Li et al., 2 Feb 2025). Theoretical analyses, especially for convergence and generalization properties in distributed, black-box, or non-stationary environments, remain an open field. The modular and plug-and-play design patterns adopted in frameworks such as MoMA (Gao et al., 7 Aug 2025) and SkillNav (Ma et al., 11 Aug 2025) also suggest an emerging paradigm for building interpretable, scalable, and robust multi-agent systems for environments ranging from healthcare to embodied AI.

The mixture-of-agents paradigm provides a unifying principle for orchestrating diverse expertise in complex systems and is becoming central to advancing the state-of-the-art in collaborative, modular, and scalable artificial intelligence.