ReAct-based Agent Architecture

Updated 15 November 2025

ReAct-based agent architecture is a framework that interleaves explicit reasoning and action steps to enable dynamic planning in LLMs.
Its modular workflow integrates thought generation, action instructions, and observation feedback to improve data efficiency and reduce hallucinations.
The architecture is applied in both single-agent and decentralized multi-agent systems, offering human-interpretable logs and adaptive tool usage.

A ReAct-based agent architecture unifies reasoning and action via interleaved loops, enabling LLMs to iteratively emit explicit reasoning traces, invoke external tools, and update their internal plans based on observed results. This architecture has become central to a broad spectrum of AI agentic systems—from single-agent reasoning pipelines to decentralized multi-agent frameworks—due to its transparency, data-efficiency, and extensibility. The ReAct paradigm is foundational in the planning taxonomy for LLM-based agents and is frequently contrasted with decoupled or planning-then-acting approaches.

1. Foundational Principles and Workflow

At its core, the ReAct (Reason + Act) agent executes the following recurrent process in each step:

Emit Reasoning Trace: The agent (an LLM) generates a short explanation of its current reasoning state (prefixed as “Thought:”).
Emit Action Instruction: The agent proposes a concrete action or tool invocation (prefixed as “Action: …”).
Execute Action: A system wrapper or orchestrator executes the proposed action, which may involve calling APIs, querying databases, invoking search engines, or running code.
Observe and Integrate Result: The observation from the action is fed back into the agent’s state and prompt.
Update Plan: The agent incorporates the new information, potentially revising its strategy or issuing further actions.

This interleaving continues until task completion or a designated stop condition. There is no requirement for the agent to strictly separate reasoning from execution; both are co-evolved at each cycle, allowing for reflective and adaptive planning.

A canonical workflow is:

Step 1: “Thought: ...”
Step 2: “Action: ...”
Step 3: Execute action
Step 4: Observe result
Step 5: Repeat until complete

Unlike plan-then-act methods, the current state—consisting of prior Thoughts, Actions, and Observations—forms the entire context, acting as memory and dynamic plan. ReAct’s modularity allows it to be embedded in both single-agent and multi-agent paradigms (Aratchige et al., 13 Mar 2025).

2. Architectural Components

A generic ReAct-based agent comprises several conceptual modules:

Component	Functionality	Implementation Example
Perception/Input	Receives task prompts and observations	User query + API results
Memory/State	Maintains evolving log of Thought/Action/Observation	System prompt history
Reasoning Module	Generates step-wise “Thought:” reasoning	LLM/chain-of-thought
Action Module	Issues “Action:” instructions to tools/APIs	Tool call executor
Plan Updater	Revises strategy upon new observation	LLM + context injection

All observed feedback is promptly incorporated for future steps, which is critical for mitigating hallucinations and increasing interpretability. When invoked in multi-agent systems, each agent may maintain its own loop, but the architecture, as surveyed in (Aratchige et al., 13 Mar 2025), is largely specified at a single-agent conceptual level. No formal multi-agent coordination protocols or role-division mechanisms specific to ReAct itself are given in the foundational survey.

3. Formalization and Algorithmic View

The survey paper (Aratchige et al., 13 Mar 2025) does not present formal objective functions, LaTeX policy equations, or pseudocode for ReAct. However, from the conceptual description, an abstract algorithmic skeleton can be stated:

Let $s_t$ be the agent’s internal state at step $t$ (including history and observations).
At each step $t$ $t$ :
- The reasoning module computes $\text{Thought}_t$ given $s_t$ .
- The agent generates $\text{Action}_t$ based on $\text{Thought}_t$ .
- The environment or tool system executes $\text{Action}_t$ , yielding $\text{Observation}_t$ .
- The state is updated: $s_{t+1} = s_t \cup \{\text{Thought}_t, \text{Action}_t, \text{Observation}_t\}$ .

This loop is iterated until task termination is triggered via some criterion (explicit in the reasoning trace or action signature).

The lack of a formal policy or objective in the survey is noteworthy; for rigorous optimization or theoretical properties, readers must consult later primary sources.

4. Implementation Patterns and Limitations

The survey (Aratchige et al., 13 Mar 2025) delineates the following implementation considerations:

Simplicity: The agent structure is intentionally lightweight, with all persistence and plan tracking delegated to the prompt/context history.
Scalability: No specific scaling or parallelization strategies are prescribed for ReAct agents. There is no discussion of latency, real-time constraints, or specialized memory management.
Multi-agent Coordination: Explicit protocols for agent-to-agent communication, centralized aggregation of reasoning traces, or decentralized plan consensus are not covered for ReAct in the survey. Coordination mechanisms are left to the domain integrator.
Action Space: The ReAct pattern, as originally validated, was applied on medium-scale action spaces (e.g., multi-hop question answering, fact verification using tools such as Wikipedia API). The architecture’s applicability to large, unstructured, or highly complex action distributions is cited as untested in the survey.
Performance: Empirical strengths are highlighted (notably performance gains and reduced hallucinations on tasks such as HotpotQA and FEVER), but no quantitative benchmarks or formal analysis appear in (Aratchige et al., 13 Mar 2025).
Recommended Extensions: The original ReAct proposal (referenced in the survey) suggests extending agent performance with fine-tuning on human-annotated demonstrations, multi-task pre-training, or reinforcement learning—especially for more intricate, dynamic environments.

5. Strengths, Limitations, and Practical Considerations

Strengths

Transparency: Interleaved reasoning and action expose intermediate cognitive states, aiding in debugging and interpretability.
Low Hallucination Rate: Folding observations into direct reasoning steps mitigates unsupported inferences, as noted in empirical retrospectives.
Flexibility: ReAct’s light scaffolding enables straightforward tool integration and prompt engineering.
Human-Interpretable Logs: Step-wise traces align with standard audit and analysis practices.

Limitations

Restricted to Medium-Scale Action Spaces: The base paradigm is unproven for complex or combinatorial environments.
No Built-in Parallelism: Out-of-the-box, ReAct is designed for iterative, not massively parallel, deployments.
No Canonical Engineering Skeleton: The survey does not supply code templates, SDK recommendations, or best practices for robust, production-grade systems.
Lack of Formal Multi-agent Algorithms: There is no protocol, message-passing mechanism, or coordinated plan updating specified for teams of ReAct agents.

Practical Recommendations

For applications demanding reliability and scalability in enterprise or safety-critical domains, further extensions—such as reinforcement learning for action selection, multi-agent role allocation protocols, or formal plan-tracking mechanisms—are necessary.
Areas where large or dynamic action spaces are encountered may exceed the straightforward applicability of ReAct; hierarchical planning, action-library pruning, or externalized memory should be considered.

6. Context and Trajectory within Multi-Agent LLM Systems

In the system taxonomy of planning approaches (Aratchige et al., 13 Mar 2025), ReAct occupies a central position alongside methods such as AdaPlanner and ChatCoT. The dominant innovation is the fusion of reasoning and acting into a single loop—contrasting with detached “first-reason/then-act” planners. This conceptual unification serves as the foundation for numerous later agentic architectures which require tight interleaving of observation, cognition, and action selection.

However, for full multi-agent system realization—especially with role assignment, fault tolerance, and distributed planning—the survey indicates that ReAct provides only the atomic agent substrate, whereas orchestration, coordination, and resilience mechanisms remain an open area for research and systems engineering.

7. Summary Table: Core Surveyed Features

Aspect	Formalization in Survey	Implementation Detail
Interleaved Reason/Action	✓ Conceptual only	No code/pseudocode given
State/Memory Management	Implied via prompt/history	No external state/memory protocols
Multi-Agent Coordination	Not described	Lacks protocol/specification
Empirical Results	Medium-scale tasks only	Gains cited without precise metrics
Extensibility	Advised via RL/fine-tuning	Not instantiated

The current state of the literature, as reflected in (Aratchige et al., 13 Mar 2025), establishes ReAct as a minimal, interpretable agentic planning cycle that combines reasoning and action for LLM-based agents. All formal, large-scale, or distributed extensions are designated as future work.

PDF Markdown Chat (Pro)

References (1)

LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems (2025)

Follow Topic

Get notified by email when new papers are published related to ReAct-based Agent Architecture.