RP-ReAct: Multi-Agent Enterprise Automation
- RP-ReAct is a multi-agent architecture that decouples high-level planning (RPA) from low-level execution (PEA) to achieve reliable enterprise task automation.
- It utilizes a context-saving mechanism that limits LLM token usage by offloading extensive tool outputs, thereby preventing context overflow.
- The design incorporates continuous dynamic replanning and empirical metrics to ensure robust performance and stability in complex, large-scale environments.
RP-ReAct is a multi-agent architecture designed for complex enterprise task automation that disentangles high-level reasoning from low-level tool execution, enabling reliable coordination of multiple tools and effective management of limited LLM context windows. The design fundamentally addresses trajectory instability and context overflow in traditional monolithic plan-execute loops, supporting robust, generalizable autonomous agents suitable for demanding enterprise environments (Molinari et al., 3 Dec 2025).
1. System Structure and Agent Roles
The RP-ReAct framework consists of two primary agentic components:
1. Reasoner-Planner Agent (RPA): Responsible for all high-level planning, decomposition of the user goal into sub-steps, and ongoing strategic reasoning driven by a large reasoning model (LRM).
- Proxy-Execution Agent (PEA): Executes sub-steps handed off from the RPA by interacting with external tools and APIs through an iterative ReAct loop. PEAs employ a context-saving mechanism to strictly bound LLM context consumption by offloading large tool outputs to external storage, providing only token-limited previews in LLM memory.
Interaction proceeds in a dialogue-like handshake: at each step , RPA emits a sub-question to the PEA, which returns a (possibly truncated) result . This exchange is repeated—allowing for replanning, correction, and context refresh—until the goal is achieved or the step budget is exhausted. The architecture supports both single and multi-agent deployments, though the canonical instantiation focuses on a single RPA supervising one or more PEAs [(Molinari et al., 3 Dec 2025), § Methodology Fig. 1].
2. Formal Process and Control Flow
The formal operational process is as follows:
- RPA Planning: At step with history , the RPA emits . Upon receiving , RPA applies a success test . If , RPA triggers a replanning procedure, computing the next sub-question as , where estimates task progress utility.
- PEA ReAct Loop: For each sub-question, the PEA engages in a ReAct cycle up to steps: iteratively “think” (generate internal reasoning), “act” (select tool/application), “execute” (run tool, obtain ), apply context-saving if , and update conversational context. The result is either a truncated tool output or a reference to externally stored data, minimizing context window overflow [(Molinari et al., 3 Dec 2025), Algorithmic Pseudocode §3].
This division means RPA’s context remains concise and focused strictly on sub-questions and results, while execution complexity and noisy tool outputs are abstracted away in the PEA.
3. Context-Saving Mechanism
A critical innovation is the context-saving strategy within the PEA. For tool output exceeding threshold tokens, the PEA stores the full output externally (database, file, or object store) and returns only a preview along with a retrieval variable to the LLM context. Formally:
- Context cost without saving:
- With context-saving:
This mechanism yields linear growth in context cost with (number of tool calls preview tokens) rather than . Empirical savings are substantial for large table/text outputs, ensuring operational viability for agents employing narrow-window open-weight LLMs [(Molinari et al., 3 Dec 2025), §2.3, §4.2].
4. Dynamic Replanning and Trajectory Stability
RP-ReAct employs continuous dynamic replanning: after each execution result, the binary success function ensures the RPA can adjust strategy or re-issue sub-steps as needed. Metrics for empirical stability are defined using trajectory accuracy for model and agent :
- Coverage Product Score:
Empirical studies report lower trajectory standard deviation and higher versus baselines, particularly on hard ToolQA tasks, supporting robustness across LLM and tool variations [(Molinari et al., 3 Dec 2025), Table 3].
5. Architectural Design Decisions
The RP-ReAct architecture results from several explicit decisions:
- Full Decoupling: Separation of high-level planning (reasoning, intent) from low-level execution (tool interaction, API error handling). The RPA is insulated from LLM context pollution due to noisy, verbose, or malformed tool responses (cf. “context-drift”).
- Budgeted Steps: RPA and PEA each use a capped number of planning/execution steps (e.g., ), guaranteeing upper bounds on latency and preventing unbounded loop execution [(Molinari et al., 3 Dec 2025), Experimental Setup].
- External Storage Abstraction: Supports tool APIs returning arbitrarily large data objects, with only pointer variables ever carried in LLM memory.
- Multi-Agent Compatibility: Architecture is readily extensible to multiple PEAs acting under a shared RPA—this suggests applicability to scenarios with high tool diversity or parallelizable subtasks.
These design elements collectively yield a system that is modular, robust to model scaling, and empirically generalizable across diverse task domains.
6. Empirical Evaluation and Application Domains
RP-ReAct was evaluated on the multi-domain ToolQA benchmark, using six open-weight reasoning models. Results demonstrate:
- Superior overall and task-specific performance versus various monolithic and tool-integrated baselines.
- Improved generalization to unseen domains, attributed to context isolation and replanning.
- Robustness and stability across LLM scales—lower variance in task trajectories.
The agentic paradigm is directly applicable in enterprise environments characterized by strict privacy requirements (local LLMs), heterogeneous toolchains (DB, spreadsheet, API, Python), and frequent large-output scenarios. The architecture supports deployments requiring strong modularity and compositional reasoning over complex workflows [(Molinari et al., 3 Dec 2025), § Results].
7. Summary Table of RP-ReAct Key Components
| Component | Function | Notable Feature |
|---|---|---|
| Reasoner-Planner | High-level planning, step-wise | Maintains clean subgoal context |
| Proxy-Execution | Sub-step execution, ReAct loop | Context-saving, external storage |
| Dynamic Replanning | Trajectory correction after feedback | Ensures goal-achievement robustness |
| Context Management | Manages tool output in LLM memory | Enables narrow-window deployment |
References
- "Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks" (Molinari et al., 3 Dec 2025)