Single-Agent Systems (SAS) in LLM Architectures
- Single-Agent Systems (SAS) are defined as architectures where a single LLM handles reasoning, planning, and action selection via a curated skill library.
- They employ a softmax-based skill selection mechanism that integrates internal chain-of-thought with external tool execution to optimize task performance.
- Empirical evaluations show that SAS reduce computational tokens and latency while matching or surpassing MAS in accuracy and scalability.
A single-agent system (SAS) refers to an agentic architecture in which a single LLM is responsible for all aspects of reasoning, planning, and action selection, either via monolithic chain-of-thought or through internal selection among a curated library of specialized skills. This contrasts with multi-agent systems (MAS), where specialized agents interact and coordinate via explicit message passing. Recent literature rigorously formalizes, analyzes, and benchmarks SAS in the context of modularity, efficiency, scalability, fault localization, and its comparative performance under controlled computation budgets (Li, 8 Jan 2026, Tran et al., 2 Apr 2026, Gao et al., 23 May 2025, Xu et al., 18 Jan 2026).
1. Formal Models of Single-Agent Systems with Skill Libraries
The canonical SAS comprises a single LLM endowed with a fixed skill library and a selection mechanism . Each skill is a tuple , where is a natural-language descriptor, denotes policy (internal prompt or instructions), and is either an external tool or for purely internal reasoning.
Skill selection is often modeled as a softmax policy:
where 0 is the context and 1 a compatibility function scoring the match between context and descriptor.
The canonical SAS execution loop proceeds as: 8 The total cost over 2 steps is decomposed as:
3
2. Compilation and Simulation of Multi-Agent Workflows in SAS
A MAS is formalized as 4 with agent set 5, communication graph 6, and protocol 7. Each agent's specialized behavior is internalized as one or more SAS skills using a compilation mapping 8, where
9
and 0 is the 1th agent's role description. The resulting skill policies explicitly enforce communication requirements as part of the skill prompt.
Correctness is established via behavioral fidelity:
2
Efficiency is ensured as long as the cumulative overhead of skill selection is less than MAS communication cost.
For homogeneous MAS (all agents use the same LLM), a single-LLM simulator can role-play all agents in a continuous conversation, leveraging key/value cache sharing for computational savings. Under deterministic tools, prompts, and routing, the single-agent simulation achieves joint distributional equivalence with multi-agent execution (Xu et al., 18 Jan 2026).
3. Empirical Evaluation: Efficiency and Accuracy
Quantitative benchmarks consistently reveal that SAS matches or exceeds MAS in task accuracy while achieving substantial reductions in computational overhead—primarily due to the elimination of inter-agent communication and enhanced KV cache reuse.
| Task | Acc_M | Acc_S | Tokens_M | Tokens_S | Lat_M | Lat_S | Calls |
|---|---|---|---|---|---|---|---|
| GSM8K | 94.0 | 92.0 | 1407 | 616 | 10,565 | 7,537 | 3→1 |
| HumanEval | 100.0 | 100.0 | 1400 | 749 | 7,227 | 2,970 | 3→1 |
| HotpotQA | 84.0 | 88.0 | 4359 | 1816 | 11,671 | 4,559 | 4→1 |
| Avg. Δ Acc | — | +0.7% | — | –53.7% | — | –49.5% | — |
Findings are robust across mathematics, program synthesis, and multi-hop reasoning (Li, 8 Jan 2026, Gao et al., 23 May 2025, Xu et al., 18 Jan 2026). SAS incurs 3 to 4 fewer token costs compared to MAS, with equivalent or slightly higher accuracy. Notably, under controlled “thinking token” budgets, single-agent reasoning consistently matches or outperforms multi-agent variants, particularly as LLMs improve in long-context handling and tool integration (Tran et al., 2 Apr 2026).
4. Scaling Limits and Skill Selection Capacity
Empirical studies show that as the number of skills 5 increases, SAS selection accuracy 6 remains near-perfect until a critical threshold 7 (typically 8 for current LLMs), beyond which accuracy collapses sharply:
9
where 0, 1, 2 for GPT-4o-mini (Li, 8 Jan 2026). This phase transition mirrors human capacity limits for menu selection and response latency (Hick's Law) and highlights semantic confusability as a dominant factor.
When skills are semantically similar—quantified via the average cosine similarity of skill descriptors—accuracy degrades linearly with interference:
3
Controlled experiments show that a single competitor per skill can induce drops of 7–30 percentage points; two competitors cause even sharper accuracy loss (up to 63 pp at high 4).
5. Information-Theoretic Foundations and Context Utilization
A rigorous perspective frames SAS vs MAS through the Data Processing Inequality:
5
where 6 is the answer, 7 the full context, and 8 the message passed in MAS. If a SAS fully utilizes 9, the MAS cannot gain information through communication. However, as context retention degrades (due to model limitations or context length), MAS may outperform by filtering or factoring the information into focused sub-messages. Empirically, SAS dominates for low degradation; MAS gains relevance only when the SAS’s context access is severely compromised (Tran et al., 2 Apr 2026).
6. Mitigating Capacity and Confusability: Hierarchical and Hybrid Approaches
Hierarchical skill routing, where selection is divided into coarse-to-fine categories, restores high-accuracy selection even as 0: 9 Empirical gains show up to 85% selection accuracy at 1 with hierarchy, compared to 45–70% for flat selection (Li, 8 Jan 2026).
A hybrid request-cascading paradigm invokes SAS first, automatically verifies its output, and escalates to MAS only on failure, reducing expected cost by up to 20% while achieving MAS-level accuracy (Gao et al., 23 May 2025).
7. Design Best Practices and Deployment Considerations
Key guidelines for scalable and robust SAS include:
- Monitor 2 relative to model capacity 3; stay within 4–5 skills if possible.
- Audit and merge semantically overlapping skills; invest in distinctive descriptors.
- Use hierarchical organization for 6; limit selection clusters to under 7 options.
- Upgrade to stronger LLMs when using large, confusable libraries is unavoidable.
- Employ hybrid cascades to retain efficiency while safeguarding accuracy.
- Reconsider MAS only for extremely complex workflows, tool orchestration, or severe context degradation.
On practical terms, SAS is now the default baseline for most modular reasoning applications, offering substantial reductions in latency and compute, while retaining fine-grained transparency for error localization via chain-of-thought step-level self-reporting (Gao et al., 23 May 2025).
Limitations and Future Research Directions
SAS cannot simulate heterogeneous multi-agent workflows (mixing different base LLMs), as KV caches do not transfer across models. While SAS saturates empirical performance on most benchmarks when capacity is not exceeded, the design space for true heterogeneous and cross-model orchestration remains open. End-to-end training of single agents on multi-role dialogues and efficient hybridization of MAS/SAS offer promising avenues for further research (Xu et al., 18 Jan 2026).