Single-Agent Architectures Overview

Updated 14 April 2026

Single-agent architectures are systems built around a single autonomous entity that integrates perception, memory, planning, and action selection to achieve complex goals.
They employ core modules such as LLM-based policies, memory systems, planners, and tool routers to efficiently execute tasks across domains like software engineering, robotics, and molecular optimization.
Empirical and theoretical studies show that, under fixed reasoning-token budgets, single-agent systems can match or exceed the performance of multi-agent systems while reducing communication overhead.

A single-agent architecture is a system in which a single autonomous computational entity—typically built using a LLM or other general-purpose reasoning core—performs all perception, planning, action selection, and tool use required to achieve complex goals in a given environment. Such architectures stand in contrast to multi-agent systems (MAS), which decompose the problem space into networks of interacting agents, each with its own role, state, or policy. Single-agent designs have emerged as the dominant paradigm for LLM-based autonomous workflows in domains ranging from software engineering and automated coding, to robotics, automated scientific discovery, and molecular optimization. This article synthesizes formal definitions, architectural principles, empirical findings, and domain-specific instantiations of single-agent architectures from the current research literature.

1. Formal Models and Canonical Control Loop

A rigorous foundation for single-agent architectures is provided by formalizations in terms of Markov Decision Processes (MDP) or partially observable MDPs (POMDP) (Masterman et al., 2024, Xu, 5 Jan 2026, V et al., 18 Jan 2026). The agent is specified by:

$A = (S, O, M, T, \pi)$

where:

$S$ is the environment state space,
$O$ is the (possibly partial) observation space,
$M$ is the internal memory (episodic, semantic, procedural),
$T$ is the action and tool set,
$\pi$ is the policy, typically parameterized by LLM weights $\theta$ or by prompt-induced distributions.

The single-agent cycle at time $t$ is:

Observation: $O_t = \Phi(S_t)$ (perception)
Memory update: $M_t = \mu(M_{t-1}, O_t, Z_{t-1}, E_{t-1})$ (memory integration)
Reasoning/Planning: $S$ 0 (latent thought/plan)
Action selection: $S$ 1
Environment transition: $S$ 2 (state update and feedback)

Variants use different memory integration and planning schemes, including explicit chain-of-thought, tree-of-thoughts, or native inference-time search (Xu, 5 Jan 2026, V et al., 18 Jan 2026).

2. Key Architectural Components and Modular Primitives

Single-agent architectures consistently encapsulate the following modules (Xu, 5 Jan 2026, V et al., 18 Jan 2026, Masterman et al., 2024):

Policy/LLM Core ( $S$ 3): An LLM configured via prompt engineering and/or finetuning, serving as the locus of reasoning, plan generation, and tool call proposals.
Memory Modules ( $S$ 4): Subsystems for storing and retrieving episodic (trace of prior steps), semantic (retrieved facts), or procedural knowledge (skill templates).
Planners: Both reactive (map current context to action) and hierarchical (decompose goals into subgoals/tasks).
Tool Router ( $S$ 5): Typed tool schemas, enabling the agent to select and invoke APIs, code execution, or environment manipulation.
Critics/Verifiers ( $S$ 6): Modules to validate outputs before external side-effects (schema checks, policy gates, safety verifiers).

The control logic can be composed from a small set of loop primitives (Rombaut, 3 Apr 2026):

ReAct (reason-act cycles),
generate-test-repair (test-driven reflexion),
plan-execute (phased decomposition),
multi-attempt retry (with selection/critic),
Monte Carlo Tree Search (search and simulation over alternatives).

Empirical studies show that high-performing agents layer multiple primitives rather than relying on a single control structure (Rombaut, 3 Apr 2026).

3. Theoretical and Empirical Foundations of Single-Agent Efficiency

Recent work establishes that, under a fixed reasoning-token budget, single-agent LLMs are information-theoretically optimal compared to MAS architectures (Tran et al., 2 Apr 2026). For tasks expressible as mappings $S$ 7, where $S$ 8 is the full context and $S$ 9 is the output, the Data Processing Inequality guarantees:

$O$ 0

where $O$ 1 is a compressed, inter-agent message. Any MAS architecture operating on summaries of $O$ 2 cannot attain lower conditional entropy or minimal error probability than a single agent with access to the entire context, assuming perfect utilization.

Empirical results confirm that for multi-hop reasoning and many complex tasks, single-agent systems match or outperform MAS when reasoning tokens are held constant. Performance plateaus or is exceeded by MAS only when the single agent's context utilization is severely degraded by sequence length, noise, or context-window artifacts (Tran et al., 2 Apr 2026). In practice, single-agent execution also offers significant computational and cost efficiency due to elimination of inter-agent communication and KV cache reuse (Xu et al., 18 Jan 2026).

4. Instantiations Across Applied Domains

Software Engineering and Automated Coding

Architectures such as ESAA formalize intent/effect separation via event sourcing, contract validation, and append-only audit logs (Filho, 26 Feb 2026). Single-agent coding assistants, as surveyed via scaffold taxonomy, vary in control-loop design, tool interface, state management, context-compaction, and multi-model routing, but converge on LLM-driven reasoning, staged tool calls, replayability, and context-persistent memory (Rombaut, 3 Apr 2026).

Molecular Optimization and Scientific Workflows

In molecular optimization, single-agent pipelines internalize all tool invocations (data retrieval, structure generation, scoring, ranking) within the LLM, with provenance tracked at each step (Ünlü et al., 5 Aug 2025). In scientific automation, single-agent frameworks use typed execution graphs, object-graph mapping, and persistent knowledge graphs to manage context and orchestration—eliminating fragile prompt-based state passing and supporting parallel, auditable computation (Bai et al., 19 Feb 2026).

Robotics

Single-agent principles are formalized in AEROS and FSAR (Qin et al., 8 Apr 2026, Qin et al., 13 Apr 2026), where one persistent agent per robot manages all skills (capabilities) via installable modules, and fleet-scale coordination is handled by explicit federation layers rather than intra-robot multi-agent decomposition. Architectural features include policy-separated runtimes, capability governance, hot-swapping of modules, closed-loop recovery, and strict authority/policy checks.

Machine Learning Engineering

Operand Quant demonstrates that a single-agent, context-persistent, non-blocking design embedded within an IDE can outperform multi-agent AutoML frameworks on comprehensive benchmarks by unifying exploration, modeling, experimentation, and deployment in a linear control loop with asynchronous execution and state persistence (Sahney et al., 13 Oct 2025).

5. Skill Library Scalability and Limitations

Single-agent architectures often employ skill-based dispatch, with the LLM selecting specialized behaviors (“skills”) for subproblems (Li, 8 Jan 2026). This approach offers the modularity of a MAS without inter-agent handoff costs, but is subject to cognitive-load-like limits: selection accuracy remains stable up to a "critical library size" (e.g., $O$ 3-100 for current LLMs), then collapses via a phase transition. Semantic confusability among skills further degrades performance beyond raw count. Hierarchical skill routing and thoughtful descriptor design can partially mitigate these effects, but for highly modular or specialized workflows, MAS architectures may still be preferred.

6. Evaluation Protocols and Benchmarking

Robust evaluation of single-agent systems spans:

End-to-end task performance (success rate, reward, time-to-completion),
Cost and efficiency (token usage, inference cost, latency),
Tool-use correctness (tool selection accuracy, schema validation),
Planning/trajectory quality,
Robustness and reliability under perturbations or adversarial input,
Safety/compliance (violations, interventions),
Human utility/preference (A/B studies) (Xu, 5 Jan 2026, Masterman et al., 2024, Tran et al., 2 Apr 2026, Filho, 26 Feb 2026).

Replayability, full provenance logs, and cryptographically verifiable transcripts are increasingly seen as essential for auditability and reproducibility (Filho, 26 Feb 2026, Bai et al., 19 Feb 2026, Sahney et al., 13 Oct 2025).

7. Future Directions and Open Challenges

Open research directions for single-agent architectures include:

Improved error recovery and failure handling,
Meta-cognitive capabilities (progress diagnosis, loop detection, dynamic budget adaptation),
Scaling skill libraries via hierarchy and cognitive-inspired routing,
Secure policy enforcement and mitigation of prompt injection,
Efficient memory organizations for long-horizon tasks,
Federation protocols for scaling to multi-robot or distributed scientific teams,
Formal theory of cost-aware single-agent planning in partially observable and dynamic environments (V et al., 18 Jan 2026, Li, 8 Jan 2026, Qin et al., 13 Apr 2026, Qin et al., 8 Apr 2026, Bai et al., 19 Feb 2026).

System designers are advised to use single-agent architectures by default, exploiting their information efficiency and cost advantages, but should monitor context utilization and skill-library scaling to determine when the complexity or specialization of the task warrants a hybrid or MAS approach (Tran et al., 2 Apr 2026, Xu et al., 18 Jan 2026, Li, 8 Jan 2026).