Papers
Topics
Authors
Recent
Search
2000 character limit reached

Single-Agent Systems (SAS) in LLM Architectures

Updated 9 April 2026
  • Single-Agent Systems (SAS) are defined as architectures where a single LLM handles reasoning, planning, and action selection via a curated skill library.
  • They employ a softmax-based skill selection mechanism that integrates internal chain-of-thought with external tool execution to optimize task performance.
  • Empirical evaluations show that SAS reduce computational tokens and latency while matching or surpassing MAS in accuracy and scalability.

A single-agent system (SAS) refers to an agentic architecture in which a single LLM is responsible for all aspects of reasoning, planning, and action selection, either via monolithic chain-of-thought or through internal selection among a curated library of specialized skills. This contrasts with multi-agent systems (MAS), where specialized agents interact and coordinate via explicit message passing. Recent literature rigorously formalizes, analyzes, and benchmarks SAS in the context of modularity, efficiency, scalability, fault localization, and its comparative performance under controlled computation budgets (Li, 8 Jan 2026, Tran et al., 2 Apr 2026, Gao et al., 23 May 2025, Xu et al., 18 Jan 2026).

1. Formal Models of Single-Agent Systems with Skill Libraries

The canonical SAS comprises a single LLM aa endowed with a fixed skill library S={s1,,sK}S = \{s_1, \ldots, s_K\} and a selection mechanism σ\sigma. Each skill is a tuple s=(δ,π,ξ)s = (\delta, \pi, \xi), where δ\delta is a natural-language descriptor, π\pi denotes policy (internal prompt or instructions), and ξT{}\xi \in T \cup \{\emptyset\} is either an external tool tTt \in T or \emptyset for purely internal reasoning.

Skill selection is often modeled as a softmax policy:

P(sh)=σ(sh)expfθ(h,δs)P(s \mid h) = \sigma(s \mid h) \propto \exp f_\theta(h, \delta_s)

where S={s1,,sK}S = \{s_1, \ldots, s_K\}0 is the context and S={s1,,sK}S = \{s_1, \ldots, s_K\}1 a compatibility function scoring the match between context and descriptor.

The canonical SAS execution loop proceeds as: δ\delta8 The total cost over S={s1,,sK}S = \{s_1, \ldots, s_K\}2 steps is decomposed as:

S={s1,,sK}S = \{s_1, \ldots, s_K\}3

(Li, 8 Jan 2026).

2. Compilation and Simulation of Multi-Agent Workflows in SAS

A MAS is formalized as S={s1,,sK}S = \{s_1, \ldots, s_K\}4 with agent set S={s1,,sK}S = \{s_1, \ldots, s_K\}5, communication graph S={s1,,sK}S = \{s_1, \ldots, s_K\}6, and protocol S={s1,,sK}S = \{s_1, \ldots, s_K\}7. Each agent's specialized behavior is internalized as one or more SAS skills using a compilation mapping S={s1,,sK}S = \{s_1, \ldots, s_K\}8, where

S={s1,,sK}S = \{s_1, \ldots, s_K\}9

and σ\sigma0 is the σ\sigma1th agent's role description. The resulting skill policies explicitly enforce communication requirements as part of the skill prompt.

Correctness is established via behavioral fidelity:

σ\sigma2

Efficiency is ensured as long as the cumulative overhead of skill selection is less than MAS communication cost.

For homogeneous MAS (all agents use the same LLM), a single-LLM simulator can role-play all agents in a continuous conversation, leveraging key/value cache sharing for computational savings. Under deterministic tools, prompts, and routing, the single-agent simulation achieves joint distributional equivalence with multi-agent execution (Xu et al., 18 Jan 2026).

3. Empirical Evaluation: Efficiency and Accuracy

Quantitative benchmarks consistently reveal that SAS matches or exceeds MAS in task accuracy while achieving substantial reductions in computational overhead—primarily due to the elimination of inter-agent communication and enhanced KV cache reuse.

Task Acc_M Acc_S Tokens_M Tokens_S Lat_M Lat_S Calls
GSM8K 94.0 92.0 1407 616 10,565 7,537 3→1
HumanEval 100.0 100.0 1400 749 7,227 2,970 3→1
HotpotQA 84.0 88.0 4359 1816 11,671 4,559 4→1
Avg. Δ Acc +0.7% –53.7% –49.5%

Findings are robust across mathematics, program synthesis, and multi-hop reasoning (Li, 8 Jan 2026, Gao et al., 23 May 2025, Xu et al., 18 Jan 2026). SAS incurs σ\sigma3 to σ\sigma4 fewer token costs compared to MAS, with equivalent or slightly higher accuracy. Notably, under controlled “thinking token” budgets, single-agent reasoning consistently matches or outperforms multi-agent variants, particularly as LLMs improve in long-context handling and tool integration (Tran et al., 2 Apr 2026).

4. Scaling Limits and Skill Selection Capacity

Empirical studies show that as the number of skills σ\sigma5 increases, SAS selection accuracy σ\sigma6 remains near-perfect until a critical threshold σ\sigma7 (typically σ\sigma8 for current LLMs), beyond which accuracy collapses sharply:

σ\sigma9

where s=(δ,π,ξ)s = (\delta, \pi, \xi)0, s=(δ,π,ξ)s = (\delta, \pi, \xi)1, s=(δ,π,ξ)s = (\delta, \pi, \xi)2 for GPT-4o-mini (Li, 8 Jan 2026). This phase transition mirrors human capacity limits for menu selection and response latency (Hick's Law) and highlights semantic confusability as a dominant factor.

When skills are semantically similar—quantified via the average cosine similarity of skill descriptors—accuracy degrades linearly with interference:

s=(δ,π,ξ)s = (\delta, \pi, \xi)3

Controlled experiments show that a single competitor per skill can induce drops of 7–30 percentage points; two competitors cause even sharper accuracy loss (up to 63 pp at high s=(δ,π,ξ)s = (\delta, \pi, \xi)4).

5. Information-Theoretic Foundations and Context Utilization

A rigorous perspective frames SAS vs MAS through the Data Processing Inequality:

s=(δ,π,ξ)s = (\delta, \pi, \xi)5

where s=(δ,π,ξ)s = (\delta, \pi, \xi)6 is the answer, s=(δ,π,ξ)s = (\delta, \pi, \xi)7 the full context, and s=(δ,π,ξ)s = (\delta, \pi, \xi)8 the message passed in MAS. If a SAS fully utilizes s=(δ,π,ξ)s = (\delta, \pi, \xi)9, the MAS cannot gain information through communication. However, as context retention degrades (due to model limitations or context length), MAS may outperform by filtering or factoring the information into focused sub-messages. Empirically, SAS dominates for low degradation; MAS gains relevance only when the SAS’s context access is severely compromised (Tran et al., 2 Apr 2026).

6. Mitigating Capacity and Confusability: Hierarchical and Hybrid Approaches

Hierarchical skill routing, where selection is divided into coarse-to-fine categories, restores high-accuracy selection even as δ\delta0: δ\delta9 Empirical gains show up to 85% selection accuracy at δ\delta1 with hierarchy, compared to 45–70% for flat selection (Li, 8 Jan 2026).

A hybrid request-cascading paradigm invokes SAS first, automatically verifies its output, and escalates to MAS only on failure, reducing expected cost by up to 20% while achieving MAS-level accuracy (Gao et al., 23 May 2025).

7. Design Best Practices and Deployment Considerations

Key guidelines for scalable and robust SAS include:

  1. Monitor δ\delta2 relative to model capacity δ\delta3; stay within δ\delta4–δ\delta5 skills if possible.
  2. Audit and merge semantically overlapping skills; invest in distinctive descriptors.
  3. Use hierarchical organization for δ\delta6; limit selection clusters to under δ\delta7 options.
  4. Upgrade to stronger LLMs when using large, confusable libraries is unavoidable.
  5. Employ hybrid cascades to retain efficiency while safeguarding accuracy.
  6. Reconsider MAS only for extremely complex workflows, tool orchestration, or severe context degradation.

On practical terms, SAS is now the default baseline for most modular reasoning applications, offering substantial reductions in latency and compute, while retaining fine-grained transparency for error localization via chain-of-thought step-level self-reporting (Gao et al., 23 May 2025).

Limitations and Future Research Directions

SAS cannot simulate heterogeneous multi-agent workflows (mixing different base LLMs), as KV caches do not transfer across models. While SAS saturates empirical performance on most benchmarks when capacity is not exceeded, the design space for true heterogeneous and cross-model orchestration remains open. End-to-end training of single agents on multi-role dialogues and efficient hybridization of MAS/SAS offer promising avenues for further research (Xu et al., 18 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single-Agent Systems (SAS).