Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 97 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 26 tok/s Pro
GPT-4o 86 tok/s
GPT OSS 120B 452 tok/s Pro
Kimi K2 211 tok/s Pro
2000 character limit reached

Single-agent or Multi-agent Systems? Why Not Both? (2505.18286v1)

Published 23 May 2025 in cs.MA, cs.AI, and cs.LG

Abstract: Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different LLM agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by long-horizon context tracking and error correction through role-specific agents. However, the design and deployment of MAS incur higher complexity and runtime cost compared to single-agent systems (SAS). Meanwhile, frontier LLMs, such as OpenAI-o3 and Gemini-2.5-Pro, have rapidly advanced in long-context reasoning, memory retention, and tool usage, mitigating many limitations that originally motivated MAS designs. In this paper, we conduct an extensive empirical study comparing MAS and SAS across various popular agentic applications. We find that the benefits of MAS over SAS diminish as LLM capabilities improve, and we propose efficient mechanisms to pinpoint the error-prone agent in MAS. Furthermore, the performance discrepancy between MAS and SAS motivates our design of a hybrid agentic paradigm, request cascading between MAS and SAS, to improve both efficiency and capability. Our design improves accuracy by 1.1-12% while reducing deployment costs by up to 20% across various agentic applications.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a hybrid SAS-MAS paradigm that integrates agent routing and cascade strategies to balance performance and cost.
  • The empirical study evaluates tasks like code generation and reasoning across LLMs, revealing a diminishing performance edge for MAS as capabilities improve.
  • It identifies critical MAS defects at node, edge, and path levels and proposes confidence-guided optimizations to enhance overall system efficiency.

Examining "Single-agent or Multi-agent Systems? Why Not Both?" (2505.18286)

This paper conducts an in-depth empirical comparison between single-agent systems (SAS) and multi-agent systems (MAS) in the context of employing LLMs for various agentic applications. The paper evaluates their respective performance, highlighting both the advantages and limitations of these paradigms. It also explores cost-effective optimizations and introduces a novel hybrid paradigm that seamlessly integrates MAS and SAS to maximize efficiency.

Introduction

Multi-agent systems (MAS) have gained prominence due to their ability to decompose complex tasks and facilitate role-specific collaboration among LLM agents, particularly in software engineering and scientific discovery applications. These systems inherently support long-context reasoning and facilitation of error correction through inter-agent communication. Despite their historically higher accuracy, the complexity and cost of MAS pose challenges, especially as LLMs like OpenAI-o3 and Gemini-2.5-Pro enhance their capabilities in long-context reasoning and tool usage. Figure 1

Figure 1: Overview of the paper. We present a comprehensive empirical comparison of MAS and SAS paradigms, and introduce cost-effective optimizations to improve their efficiency and effectiveness.

Performance Evaluation of MAS and SAS

The extensive paper conducted in Table 1 compares the performance of MAS and SAS across various agentic tasks, such as code generation, mathematical reasoning, travel planning, and scientific experimentation, using several frameworks and both proprietary and open-source LLMs. Notably, MAS, initially outperforming SAS, loses its edge as the capabilities of LLMs advance.

MAS Performance on Historical Datasets

Table 2 demonstrates that while MAS marginally outperformed SAS using ChatGPT, this advantage sharply diminishes with Gemini-2.0-Flash, with performance improvements dropping from 10%+ to around 3%.

MAS Defects Analysis

The paper identifies key defects within MAS that limit their performance:

  1. Node-Level Defect: Performance is constrained by the critical agent tasked with the most challenging subtask, which models like Gemini-2.0-Flash can offset due to their stronger capabilities.
  2. Edge-Level Defect: Overthinking arises when downstream agents receive excessive or redundant information, as evidenced when simpler SAS systems outperform MAS in certain instances.
  3. Path-Level Defect: Errors propagate through chains of agent interactions, leading to failure in cases where SAS would succeed with more transparent context retention. Figure 2

    Figure 2: SAS can achieve comparable accuracy performance to MAS.

Cost-Effective Agentic Paradigms

Given the diminishing performance edge of MAS and their higher deployment cost, the paper introduces innovative solutions to optimize agentic system operations:

Augmenting MAS Critical Path

The authors propose a confidence-guided probing method to identify and prioritize augmenting critical agents, which benefits the system's cost-effectiveness by minimizing overhead while maintaining performance improvements, as illustrated in Figure 3. Figure 3

Figure 3: We propose a lightweight, confidence-guided probing method to identify critical agents for improvement (left), and further improve cost-effectiveness by integrating SAS and MAS paradigms.

Hybrid SAS-MAS Paradigm

The hybrid approach involves an Agent Routing strategy whereby a complexity-based assessment routes requests to MAS or SAS, optimizing both accuracy and cost. The Agent Cascade paradigm extends this, initially passing requests through SAS and only escalating to MAS if initial attempts are unsatisfactory. As seen in experimental results, this method achieved up to 12% accuracy gains while dramatically reducing costs.

Conclusion

The paper highlights the evolving landscape of agentic system design wherein MAS, while useful for privacy and parallelism, can often be cost-prohibitive and less performance-oriented, especially against advanced SAS using cutting-edge LLMs. Through rigorous empirical assessments and the introduction of cost-saving agent optimization paradigms, the paper paves a path for deploying adaptable, efficient AI-powered agentic systems. While the findings challenge conventionally held views on MAS superiority, the paper underscores the importance of adaptive deployment strategies that harmonize accuracy and cost-efficiency in real-world applications.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube