- The paper introduces a hybrid SAS-MAS paradigm that integrates agent routing and cascade strategies to balance performance and cost.
- The empirical study evaluates tasks like code generation and reasoning across LLMs, revealing a diminishing performance edge for MAS as capabilities improve.
- It identifies critical MAS defects at node, edge, and path levels and proposes confidence-guided optimizations to enhance overall system efficiency.
Examining "Single-agent or Multi-agent Systems? Why Not Both?" (2505.18286)
This paper conducts an in-depth empirical comparison between single-agent systems (SAS) and multi-agent systems (MAS) in the context of employing LLMs for various agentic applications. The paper evaluates their respective performance, highlighting both the advantages and limitations of these paradigms. It also explores cost-effective optimizations and introduces a novel hybrid paradigm that seamlessly integrates MAS and SAS to maximize efficiency.
Introduction
Multi-agent systems (MAS) have gained prominence due to their ability to decompose complex tasks and facilitate role-specific collaboration among LLM agents, particularly in software engineering and scientific discovery applications. These systems inherently support long-context reasoning and facilitation of error correction through inter-agent communication. Despite their historically higher accuracy, the complexity and cost of MAS pose challenges, especially as LLMs like OpenAI-o3 and Gemini-2.5-Pro enhance their capabilities in long-context reasoning and tool usage.
Figure 1: Overview of the paper. We present a comprehensive empirical comparison of MAS and SAS paradigms, and introduce cost-effective optimizations to improve their efficiency and effectiveness.
The extensive paper conducted in Table 1 compares the performance of MAS and SAS across various agentic tasks, such as code generation, mathematical reasoning, travel planning, and scientific experimentation, using several frameworks and both proprietary and open-source LLMs. Notably, MAS, initially outperforming SAS, loses its edge as the capabilities of LLMs advance.
MAS Performance on Historical Datasets
Table 2 demonstrates that while MAS marginally outperformed SAS using ChatGPT, this advantage sharply diminishes with Gemini-2.0-Flash, with performance improvements dropping from 10%+ to around 3%.
MAS Defects Analysis
The paper identifies key defects within MAS that limit their performance:
- Node-Level Defect: Performance is constrained by the critical agent tasked with the most challenging subtask, which models like Gemini-2.0-Flash can offset due to their stronger capabilities.
- Edge-Level Defect: Overthinking arises when downstream agents receive excessive or redundant information, as evidenced when simpler SAS systems outperform MAS in certain instances.
- Path-Level Defect: Errors propagate through chains of agent interactions, leading to failure in cases where SAS would succeed with more transparent context retention.
Figure 2: SAS can achieve comparable accuracy performance to MAS.
Cost-Effective Agentic Paradigms
Given the diminishing performance edge of MAS and their higher deployment cost, the paper introduces innovative solutions to optimize agentic system operations:
Augmenting MAS Critical Path
The authors propose a confidence-guided probing method to identify and prioritize augmenting critical agents, which benefits the system's cost-effectiveness by minimizing overhead while maintaining performance improvements, as illustrated in Figure 3.
Figure 3: We propose a lightweight, confidence-guided probing method to identify critical agents for improvement (left), and further improve cost-effectiveness by integrating SAS and MAS paradigms.
Hybrid SAS-MAS Paradigm
The hybrid approach involves an Agent Routing strategy whereby a complexity-based assessment routes requests to MAS or SAS, optimizing both accuracy and cost. The Agent Cascade paradigm extends this, initially passing requests through SAS and only escalating to MAS if initial attempts are unsatisfactory. As seen in experimental results, this method achieved up to 12% accuracy gains while dramatically reducing costs.
Conclusion
The paper highlights the evolving landscape of agentic system design wherein MAS, while useful for privacy and parallelism, can often be cost-prohibitive and less performance-oriented, especially against advanced SAS using cutting-edge LLMs. Through rigorous empirical assessments and the introduction of cost-saving agent optimization paradigms, the paper paves a path for deploying adaptable, efficient AI-powered agentic systems. While the findings challenge conventionally held views on MAS superiority, the paper underscores the importance of adaptive deployment strategies that harmonize accuracy and cost-efficiency in real-world applications.