Single-Agent Systems (SAS) in LLM Architectures

Updated 9 April 2026

Single-Agent Systems (SAS) are defined as architectures where a single LLM handles reasoning, planning, and action selection via a curated skill library.
They employ a softmax-based skill selection mechanism that integrates internal chain-of-thought with external tool execution to optimize task performance.
Empirical evaluations show that SAS reduce computational tokens and latency while matching or surpassing MAS in accuracy and scalability.

A single-agent system (SAS) refers to an agentic architecture in which a single LLM is responsible for all aspects of reasoning, planning, and action selection, either via monolithic chain-of-thought or through internal selection among a curated library of specialized skills. This contrasts with multi-agent systems (MAS), where specialized agents interact and coordinate via explicit message passing. Recent literature rigorously formalizes, analyzes, and benchmarks SAS in the context of modularity, efficiency, scalability, fault localization, and its comparative performance under controlled computation budgets (Li, 8 Jan 2026, Tran et al., 2 Apr 2026, Gao et al., 23 May 2025, Xu et al., 18 Jan 2026).

1. Formal Models of Single-Agent Systems with Skill Libraries

The canonical SAS comprises a single LLM $a$ endowed with a fixed skill library $S = \{s_1, \ldots, s_K\}$ and a selection mechanism $\sigma$ . Each skill is a tuple $s = (\delta, \pi, \xi)$ , where $\delta$ is a natural-language descriptor, $\pi$ denotes policy (internal prompt or instructions), and $\xi \in T \cup \{\emptyset\}$ is either an external tool $t \in T$ or $\emptyset$ for purely internal reasoning.

Skill selection is often modeled as a softmax policy:

$P(s \mid h) = \sigma(s \mid h) \propto \exp f_\theta(h, \delta_s)$

where $S = \{s_1, \ldots, s_K\}$ 0 is the context and $S = \{s_1, \ldots, s_K\}$ 1 a compatibility function scoring the match between context and descriptor.

The canonical SAS execution loop proceeds as: $\delta$ 8 The total cost over $S = \{s_1, \ldots, s_K\}$ 2 steps is decomposed as:

$S = \{s_1, \ldots, s_K\}$ 3

(Li, 8 Jan 2026).

2. Compilation and Simulation of Multi-Agent Workflows in SAS

A MAS is formalized as $S = \{s_1, \ldots, s_K\}$ 4 with agent set $S = \{s_1, \ldots, s_K\}$ 5, communication graph $S = \{s_1, \ldots, s_K\}$ 6, and protocol $S = \{s_1, \ldots, s_K\}$ 7. Each agent's specialized behavior is internalized as one or more SAS skills using a compilation mapping $S = \{s_1, \ldots, s_K\}$ 8, where

$S = \{s_1, \ldots, s_K\}$ 9

and $\sigma$ 0 is the $\sigma$ 1th agent's role description. The resulting skill policies explicitly enforce communication requirements as part of the skill prompt.

Correctness is established via behavioral fidelity:

$\sigma$ 2

Efficiency is ensured as long as the cumulative overhead of skill selection is less than MAS communication cost.

For homogeneous MAS (all agents use the same LLM), a single-LLM simulator can role-play all agents in a continuous conversation, leveraging key/value cache sharing for computational savings. Under deterministic tools, prompts, and routing, the single-agent simulation achieves joint distributional equivalence with multi-agent execution (Xu et al., 18 Jan 2026).

3. Empirical Evaluation: Efficiency and Accuracy

Quantitative benchmarks consistently reveal that SAS matches or exceeds MAS in task accuracy while achieving substantial reductions in computational overhead—primarily due to the elimination of inter-agent communication and enhanced KV cache reuse.

Task	Acc_M	Acc_S	Tokens_M	Tokens_S	Lat_M	Lat_S	Calls
GSM8K	94.0	92.0	1407	616	10,565	7,537	3→1
HumanEval	100.0	100.0	1400	749	7,227	2,970	3→1
HotpotQA	84.0	88.0	4359	1816	11,671	4,559	4→1
Avg. Δ Acc	—	+0.7%	—	–53.7%	—	–49.5%	—

Findings are robust across mathematics, program synthesis, and multi-hop reasoning (Li, 8 Jan 2026, Gao et al., 23 May 2025, Xu et al., 18 Jan 2026). SAS incurs $\sigma$ 3 to $\sigma$ 4 fewer token costs compared to MAS, with equivalent or slightly higher accuracy. Notably, under controlled “thinking token” budgets, single-agent reasoning consistently matches or outperforms multi-agent variants, particularly as LLMs improve in long-context handling and tool integration (Tran et al., 2 Apr 2026).

4. Scaling Limits and Skill Selection Capacity

Empirical studies show that as the number of skills $\sigma$ 5 increases, SAS selection accuracy $\sigma$ 6 remains near-perfect until a critical threshold $\sigma$ 7 (typically $\sigma$ 8 for current LLMs), beyond which accuracy collapses sharply:

$\sigma$ 9

where $s = (\delta, \pi, \xi)$ 0, $s = (\delta, \pi, \xi)$ 1, $s = (\delta, \pi, \xi)$ 2 for GPT-4o-mini (Li, 8 Jan 2026). This phase transition mirrors human capacity limits for menu selection and response latency (Hick's Law) and highlights semantic confusability as a dominant factor.

When skills are semantically similar—quantified via the average cosine similarity of skill descriptors—accuracy degrades linearly with interference:

$s = (\delta, \pi, \xi)$ 3

Controlled experiments show that a single competitor per skill can induce drops of 7–30 percentage points; two competitors cause even sharper accuracy loss (up to 63 pp at high $s = (\delta, \pi, \xi)$ 4).

5. Information-Theoretic Foundations and Context Utilization

A rigorous perspective frames SAS vs MAS through the Data Processing Inequality:

$s = (\delta, \pi, \xi)$ 5

where $s = (\delta, \pi, \xi)$ 6 is the answer, $s = (\delta, \pi, \xi)$ 7 the full context, and $s = (\delta, \pi, \xi)$ 8 the message passed in MAS. If a SAS fully utilizes $s = (\delta, \pi, \xi)$ 9, the MAS cannot gain information through communication. However, as context retention degrades (due to model limitations or context length), MAS may outperform by filtering or factoring the information into focused sub-messages. Empirically, SAS dominates for low degradation; MAS gains relevance only when the SAS’s context access is severely compromised (Tran et al., 2 Apr 2026).

6. Mitigating Capacity and Confusability: Hierarchical and Hybrid Approaches

Hierarchical skill routing, where selection is divided into coarse-to-fine categories, restores high-accuracy selection even as $\delta$ 0: $\delta$ 9 Empirical gains show up to 85% selection accuracy at $\delta$ 1 with hierarchy, compared to 45–70% for flat selection (Li, 8 Jan 2026).

A hybrid request-cascading paradigm invokes SAS first, automatically verifies its output, and escalates to MAS only on failure, reducing expected cost by up to 20% while achieving MAS-level accuracy (Gao et al., 23 May 2025).

7. Design Best Practices and Deployment Considerations

Key guidelines for scalable and robust SAS include:

Monitor $\delta$ 2 relative to model capacity $\delta$ 3; stay within $\delta$ 4– $\delta$ 5 skills if possible.
Audit and merge semantically overlapping skills; invest in distinctive descriptors.
Use hierarchical organization for $\delta$ 6; limit selection clusters to under $\delta$ 7 options.
Upgrade to stronger LLMs when using large, confusable libraries is unavoidable.
Employ hybrid cascades to retain efficiency while safeguarding accuracy.
Reconsider MAS only for extremely complex workflows, tool orchestration, or severe context degradation.

On practical terms, SAS is now the default baseline for most modular reasoning applications, offering substantial reductions in latency and compute, while retaining fine-grained transparency for error localization via chain-of-thought step-level self-reporting (Gao et al., 23 May 2025).

Limitations and Future Research Directions

SAS cannot simulate heterogeneous multi-agent workflows (mixing different base LLMs), as KV caches do not transfer across models. While SAS saturates empirical performance on most benchmarks when capacity is not exceeded, the design space for true heterogeneous and cross-model orchestration remains open. End-to-end training of single agents on multi-role dialogues and efficient hybridization of MAS/SAS offer promising avenues for further research (Xu et al., 18 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (4)

When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail (2026)

Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets (2026)

Single-agent or Multi-agent Systems? Why Not Both? (2025)

Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Single-Agent Systems (SAS).

Single-Agent Systems (SAS) in LLM Architectures

1. Formal Models of Single-Agent Systems with Skill Libraries

2. Compilation and Simulation of Multi-Agent Workflows in SAS

3. Empirical Evaluation: Efficiency and Accuracy

4. Scaling Limits and Skill Selection Capacity

5. Information-Theoretic Foundations and Context Utilization

6. Mitigating Capacity and Confusability: Hierarchical and Hybrid Approaches

7. Design Best Practices and Deployment Considerations

Limitations and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Single-Agent Systems (SAS) in LLM Architectures

1. Formal Models of Single-Agent Systems with Skill Libraries

2. Compilation and Simulation of Multi-Agent Workflows in SAS

3. Empirical Evaluation: Efficiency and Accuracy

4. Scaling Limits and Skill Selection Capacity

5. Information-Theoretic Foundations and Context Utilization

6. Mitigating Capacity and Confusability: Hierarchical and Hybrid Approaches

7. Design Best Practices and Deployment Considerations

Limitations and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research