- The paper introduces Anemoi, a semi-centralized multi-agent system using A2A MCP for real-time, collaborative plan refinement.
- It demonstrates a 9.09% performance improvement over the OWL baseline on the GAIA benchmark with reduced context redundancy.
- The system minimizes dependency on a powerful planner by enabling heterogeneous agents to communicate directly and efficiently.
Anemoi: A Semi-Centralized Multi-Agent System with Direct Agent-to-Agent Communication
Introduction
The paper presents Anemoi, a semi-centralized multi-agent system (MAS) that leverages the Agent-to-Agent (A2A) communication Model Context Protocol (MCP) server from Coral Protocol to address the limitations of traditional context-engineering-based, centralized MAS architectures. The primary motivation is to reduce the dependency on a single, powerful planner LLM and to enable more efficient, scalable, and robust agent collaboration through structured, real-time inter-agent communication. The system is evaluated on the GAIA benchmark, demonstrating significant improvements over state-of-the-art open-source baselines, particularly when the planner is a smaller LLM.
Figure 1: Architecture of the Anemoi: a semi-centralized multi-agent system based on the A2A communication MCP server from Coral Protocol.
Background and Motivation
Traditional MAS frameworks typically employ a centralized planner that decomposes tasks and coordinates worker agents via unidirectional prompt passing. This approach, while effective with strong LLMs, suffers from two critical drawbacks:
- Planner Dependency: System performance is tightly coupled to the planner's LLM capability. Substituting a strong LLM with a smaller one leads to substantial performance degradation.
- Limited Inter-Agent Communication: Collaboration is realized through prompt concatenation and manual context injection, resulting in high token overhead, redundancy, and information loss.
Anemoi is designed to overcome these bottlenecks by introducing a semi-centralized architecture where all agents can directly communicate, monitor progress, and collaboratively refine plans in real time.
System Architecture
Anemoi's architecture is built around the A2A communication MCP server, which provides thread-based, structured communication primitives for agent discovery, thread management, and message exchange. The system comprises the following agent types:
Each agent is integrated with the MCP toolkit, enabling dynamic participation in communication threads, direct message passing, and real-time monitoring of task progress.
Communication Protocol and Workflow
The A2A MCP server exposes a set of primitives (list_agents
, create_thread
, send_message
, etc.) that facilitate structured, thread-based communication. The workflow proceeds as follows:
- Agent Discovery: Agents enumerate available participants.
- Thread Initialization: The planner creates a thread, broadcasts the initial plan, and allocates subtasks.
- Task Execution and Monitoring: Worker agents execute subtasks, critique agent evaluates outputs, and all agents can propose refinements or alternative strategies.
- Consensus: Before submission, all agents vote on the candidate solution.
- Answer Submission: The answer-finding agent submits the validated result.
This protocol enables adaptive plan refinement, reduces reliance on the planner, and minimizes redundant context passing, leading to improved scalability and efficiency.
Experimental Evaluation
Baselines and Implementation
Anemoi is evaluated on the GAIA benchmark, which tests multi-step, real-world tasks requiring web search, document processing, and coding. The worker agents and toolkits are identical to those used in the OWL baseline, ensuring a controlled comparison. The planner agent uses GPT-4.1-mini, while worker agents use GPT-4o.
Main Results
Anemoi achieves an average accuracy of 52.73% on the GAIA validation set, outperforming the strongest open-source baseline OWL (43.63%) by +9.09 percentage points under identical LLM configurations. Notably, Anemoi with a weaker planner surpasses several proprietary and open-source frameworks that employ stronger LLMs, underscoring the efficacy of the A2A-based semi-centralized paradigm.
Comparative and Error Analysis
Task Attribution Analysis
A detailed comparison between Anemoi and OWL reveals that Anemoi solves 25 tasks that OWL fails, primarily due to collaborative refinement (52%) and reduced context redundancy (8%). Conversely, most tasks solved by OWL but not Anemoi are attributed to stochastic worker behavior and, to a lesser extent, communication latency.
Figure 3: Comparison of task attribution categories between Anemoi and OWL. The donut chart illustrates the distribution of reasons why Anemoi succeeded where OWL failed, and vice versa.
Error Breakdown
Anemoi's remaining errors are predominantly due to LLM capability limitations (45.6%), toolkit constraints (20.6%), incorrect plans (11.8%), communication latency (10.3%), annotation errors (7.4%), and LLM hallucinations (4.4%).
Figure 4: Remaining errors of the Anemoi.
Implications and Future Directions
The Anemoi architecture demonstrates that semi-centralized MAS with direct A2A communication can significantly improve performance, robustness, and scalability, especially when planner LLMs are resource-constrained. The reduction in token overhead and the ability for agents to collaboratively refine plans in real time are particularly advantageous for complex, multi-step tasks.
Theoretically, this work suggests that MAS architectures should move beyond rigid, centralized planning and embrace more flexible, communication-rich paradigms. Practically, the approach enables cost-effective deployment of MAS in environments where access to large LLMs is limited.
Future research directions include:
- Enhancing agent autonomy and specialization.
- Improving fault tolerance and recovery from communication failures.
- Extending the protocol to support heterogeneous agent populations and human-in-the-loop scenarios.
- Investigating the impact of more advanced consensus mechanisms and dynamic agent instantiation.
Conclusion
Anemoi introduces a semi-centralized MAS architecture that leverages structured A2A communication to overcome the limitations of context-engineering-based, centralized systems. The empirical results on the GAIA benchmark demonstrate substantial performance gains, particularly in settings with weaker planner LLMs. This work provides a concrete foundation for scalable, robust, and efficient MAS, and points toward a future where agent collaboration is both adaptive and communication-efficient.