Papers
Topics
Authors
Recent
2000 character limit reached

RP-ReAct: Multi-Agent Enterprise Automation

Updated 3 February 2026
  • RP-ReAct is a multi-agent architecture that decouples high-level planning (RPA) from low-level execution (PEA) to achieve reliable enterprise task automation.
  • It utilizes a context-saving mechanism that limits LLM token usage by offloading extensive tool outputs, thereby preventing context overflow.
  • The design incorporates continuous dynamic replanning and empirical metrics to ensure robust performance and stability in complex, large-scale environments.

RP-ReAct is a multi-agent architecture designed for complex enterprise task automation that disentangles high-level reasoning from low-level tool execution, enabling reliable coordination of multiple tools and effective management of limited LLM context windows. The design fundamentally addresses trajectory instability and context overflow in traditional monolithic plan-execute loops, supporting robust, generalizable autonomous agents suitable for demanding enterprise environments (Molinari et al., 3 Dec 2025).

1. System Structure and Agent Roles

The RP-ReAct framework consists of two primary agentic components:

1. Reasoner-Planner Agent (RPA): Responsible for all high-level planning, decomposition of the user goal GG into sub-steps, and ongoing strategic reasoning driven by a large reasoning model (LRM).

  1. Proxy-Execution Agent (PEA): Executes sub-steps handed off from the RPA by interacting with external tools and APIs through an iterative ReAct loop. PEAs employ a context-saving mechanism to strictly bound LLM context consumption by offloading large tool outputs to external storage, providing only token-limited previews in LLM memory.

Interaction proceeds in a dialogue-like handshake: at each step tt, RPA emits a sub-question sts_t to the PEA, which returns a (possibly truncated) result rtr_t. This exchange is repeated—allowing for replanning, correction, and context refresh—until the goal is achieved or the step budget is exhausted. The architecture supports both single and multi-agent deployments, though the canonical instantiation focuses on a single RPA supervising one or more PEAs [(Molinari et al., 3 Dec 2025), § Methodology Fig. 1].

2. Formal Process and Control Flow

The formal operational process is as follows:

  • RPA Planning: At step tt with history Ht={(si,ri)}i<tH_t = \{(s_i, r_i)\}_{i<t}, the RPA emits st=πR(Ht,G)s_t = \pi_R(H_t, G). Upon receiving rtr_t, RPA applies a success test δ(rt){0,1}\delta(r_t) \in \{0,1\}. If δ(rt)=0\delta(r_t) = 0, RPA triggers a replanning procedure, computing the next sub-question st+1s_{t+1} as st+1=argmaxsU(sHt,G)s_{t+1} = \text{argmax}_s U(s | H_t, G), where U()U(\cdot) estimates task progress utility.
  • PEA ReAct Loop: For each sub-question, the PEA engages in a ReAct cycle up to NN steps: iteratively “think” (generate internal reasoning), “act” (select tool/application), “execute” (run tool, obtain oko_k), apply context-saving if ok>T|o_k| > T, and update conversational context. The result rtr_t is either a truncated tool output or a reference to externally stored data, minimizing context window overflow [(Molinari et al., 3 Dec 2025), Algorithmic Pseudocode §3].

This division means RPA’s context remains concise and focused strictly on sub-questions and results, while execution complexity and noisy tool outputs are abstracted away in the PEA.

3. Context-Saving Mechanism

A critical innovation is the context-saving strategy within the PEA. For tool output oko_k exceeding threshold TT tokens, the PEA stores the full output externally (database, file, or object store) and returns only a preview pk=ok[1:T]p_k = o_k[1:T] along with a retrieval variable vkv_k to the LLM context. Formally:

  • Context cost without saving: Cnos=oiC_{\text{nos}} = \sum |o_i|
  • With context-saving: Csav=min(oi,T)+overhead(noverflows)C_{\text{sav}} = \sum \min(|o_i|, T) + \text{overhead}(n_{\text{overflows}})

This mechanism yields linear growth in context cost with KTK \cdot T (number of tool calls ×\times preview tokens) rather than KmaxoiK \cdot \max |o_i|. Empirical savings S=CnosCsav=overflow(oi)S = C_{\text{nos}} - C_{\text{sav}} = \sum \text{overflow}(o_i) are substantial for large table/text outputs, ensuring operational viability for agents employing narrow-window open-weight LLMs [(Molinari et al., 3 Dec 2025), §2.3, §4.2].

4. Dynamic Replanning and Trajectory Stability

RP-ReAct employs continuous dynamic replanning: after each execution result, the binary success function δ(rt)\delta(r_t) ensures the RPA can adjust strategy or re-issue sub-steps as needed. Metrics for empirical stability are defined using trajectory accuracy Acc(m,a)\text{Acc}(m,a) for model mm and agent aa:

  • MaxAcca=maxmAcc(m,a)\text{MaxAcc}_a = \max_m \text{Acc}(m, a)
  • AvgAcca=1MmAcc(m,a)\text{AvgAcc}_a = \frac{1}{|M|} \sum_m \text{Acc}(m, a)
  • Sata=1(MaxAccaAvgAcca)\text{Sat}_a = 1 - (\text{MaxAcc}_a - \text{AvgAcc}_a)
  • Coverage Product Score: CPSa=SataMaxAcca\text{CPS}_a = \text{Sat}_a \cdot \text{MaxAcc}_a

Empirical studies report lower trajectory standard deviation and higher CPS\text{CPS} versus baselines, particularly on hard ToolQA tasks, supporting robustness across LLM and tool variations [(Molinari et al., 3 Dec 2025), Table 3].

5. Architectural Design Decisions

The RP-ReAct architecture results from several explicit decisions:

  • Full Decoupling: Separation of high-level planning (reasoning, intent) from low-level execution (tool interaction, API error handling). The RPA is insulated from LLM context pollution due to noisy, verbose, or malformed tool responses (cf. “context-drift”).
  • Budgeted Steps: RPA and PEA each use a capped number of planning/execution steps (e.g., N=10N=10), guaranteeing upper bounds on latency and preventing unbounded loop execution [(Molinari et al., 3 Dec 2025), Experimental Setup].
  • External Storage Abstraction: Supports tool APIs returning arbitrarily large data objects, with only pointer variables ever carried in LLM memory.
  • Multi-Agent Compatibility: Architecture is readily extensible to multiple PEAs acting under a shared RPA—this suggests applicability to scenarios with high tool diversity or parallelizable subtasks.

These design elements collectively yield a system that is modular, robust to model scaling, and empirically generalizable across diverse task domains.

6. Empirical Evaluation and Application Domains

RP-ReAct was evaluated on the multi-domain ToolQA benchmark, using six open-weight reasoning models. Results demonstrate:

  • Superior overall and task-specific performance versus various monolithic and tool-integrated baselines.
  • Improved generalization to unseen domains, attributed to context isolation and replanning.
  • Robustness and stability across LLM scales—lower variance in task trajectories.

The agentic paradigm is directly applicable in enterprise environments characterized by strict privacy requirements (local LLMs), heterogeneous toolchains (DB, spreadsheet, API, Python), and frequent large-output scenarios. The architecture supports deployments requiring strong modularity and compositional reasoning over complex workflows [(Molinari et al., 3 Dec 2025), § Results].

7. Summary Table of RP-ReAct Key Components

Component Function Notable Feature
Reasoner-Planner High-level planning, step-wise Maintains clean subgoal context
Proxy-Execution Sub-step execution, ReAct loop Context-saving, external storage
Dynamic Replanning Trajectory correction after feedback Ensures goal-achievement robustness
Context Management Manages tool output in LLM memory Enables narrow-window deployment

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RP-ReAct Architecture.