Papers
Topics
Authors
Recent
2000 character limit reached

Plan–Act Dual Agent Architectures

Updated 8 January 2026
  • Plan–Act dual agent architectures are frameworks that separate high-level planning from low-level action execution, enabling adaptive and efficient decision-making.
  • They employ dynamic switching, hierarchical memory, and clear communication protocols to balance strategic planning with responsive control.
  • Empirical studies report enhanced success rates and reduced token use in applications spanning robotics, web navigation, and multi-agent cooperation.

Plan–Act Dual Agent Architectures are systems that bifurcate intelligent decision-making into separate components for high-level planning and low-level action execution. This decomposition reflects principles from both cognitive science (dual-process theory) and classic artificial intelligence (hierarchical control), and has recently become foundational in LLM agent research, robotics, web navigation, multimodal systems, and multi-agent cooperation. The essential feature of these frameworks is the explicit separation or dynamic interplay between a “Planner”—producing strategic, long-horizon decompositions—and an “Actor” or “Executor”—specialized for fast, environment-specific processing and action execution. These agents communicate through well-defined protocols, often mediated by memory, planning context, and feedback mechanisms, enabling robust, efficient, and adaptive behavior across diverse problem domains.

1. Architectural Principles and Canonical Variants

Plan–Act dual agent architectures can be categorized along several dimensions: agent role separation, switching policy, memory integration, and multi-core composition.

  • Role Separation: The Planner typically formulates macro-level structure, decomposing global tasks into subgoals or steps (e.g., structured plans for web navigation, video generation, or multi-agent transport (Erdogan et al., 12 Mar 2025, Liang et al., 11 Nov 2025, Liu et al., 2024)). The Actor translates these high-level instructions into concrete actions (tool invocation, controller commands, web API calls), handling environmental feedback and exceptions (Sasabuchi et al., 1 Apr 2025, Hou et al., 2024).
  • Switching and Interleaving: Switching between planning and acting may be periodic, state-triggered, or driven by gating modules using learned or heuristic rules. Systems often implement adaptive switching policies based on signals such as task complexity, error frequency, or progress metrics (Liu et al., 7 Aug 2025, Paglieri et al., 3 Sep 2025, Zhang et al., 9 Dec 2025).
  • Memory Architecture: Many Plan–Act systems incorporate hierarchical memory (global, task-specific, user-context) to maintain long-horizon coherence, store intermediate artifacts, and enable reflective adaptation (Liang et al., 11 Nov 2025).
  • Multi-Core Agent Models: Extensions such as LLM-Agent-UMF (Hassouna et al., 2024) provide formal definitions of “active” (with planning, memory, profile, action, security modules) and “passive” (action/security only) core agents, supporting architectures with various combinations for scalability, modularity, and security.

Typical instantiations include classic hierarchical control (e.g., POMDP-BDI hybrids (Rens et al., 2016)), dual-process cognitive frameworks (System 1/2), co-adaptive dual-strategy agents (holistic plan + local policy (Zhang et al., 9 Dec 2025)), and demonstration-free architectures with internal reflection (Yoo et al., 2023).

2. Formal Algorithms and Data Flows

The Plan–Act paradigm is operationalized through modular algorithms and explicit communication flows:

  • Planner Function: Mathematically, a planner is defined as P:SAP: S \rightarrow A, mapping structured problem state (goals, environment, prior actions) to symbolic next actions or structured plans (Sasabuchi et al., 1 Apr 2025, Erdogan et al., 12 Mar 2025).
  • Actor Function: The actor executes low-level commands derived from the planner, often by invoking environment APIs, robot controllers, or code-action tools. Execution continues until a termination or replanning condition is met, as defined by a timing or event-based function T:S×A{call_LLM,continue}T: S \times A \rightarrow \{\mathrm{call\_LLM}, \mathrm{continue}\} (Sasabuchi et al., 1 Apr 2025).
  • Interaction Protocols: Systems arrange Planner–Actor cycles through structured pseudocode loops, e.g.:
    1
    2
    3
    4
    5
    6
    7
    
    while not goal_reached(s):
        a = Planner(s)
        start_execution(a)
        while not action_complete(a):
            if TriggerReplan(s, a):
                stop_execution(a)
                break
    Feedback mechanisms such as action text (summarizing current/last action) are crucial in shifting between passive and active modes (Sasabuchi et al., 1 Apr 2025).
  • Meta-Planning and Adaptation: Multi-agent variants incorporate meta-plan generation by designer–evaluator negotiation and synchronize execution phases through progress-based triggers (Liu et al., 2024).

3. Switching Policies and Dynamic Control

Adaptive switching is central to resource-efficient and robust performance.

  • Gating Mechanisms: Agents utilize gating variables (e.g., λ\lambda) that arbitrate between fast, reactive policies (System 1) and compute-intensive planners (System 2). Switching can be learned (neural gates, preference models) or rule-based (error count, task complexity threshold), operationalized as:

pθ(ats1:t,a1:t1,g)=λtπ1(atst,g)+(1λt)π2(atst,ht,g)p_\theta(a_t|s_{1:t}, a_{1:t-1}, g) = \lambda_t \cdot \pi_1(a_t|s_t,g) + (1-\lambda_t)\cdot \pi_2(a_t|s_t,h_t,g)

with entropy-based or trace-driven context (Liu et al., 7 Aug 2025, Zhang et al., 17 Feb 2025).

  • Reflection and Fitness Scoring: Single-model variants such as DuSAR (Zhang et al., 9 Dec 2025) implement internal fitness scoring (st[0,100]s_t \in [0,100]) to decide when to revise the global plan (stuck/error), refine it (milestone reached), or maintain the current strategy:

Ht={HolisticReflect(I,E<t,Ht1)if st1=0 or 50st199 Ht1if 1st149 Terminateif st1=100H_t = \begin{cases} \mathrm{HolisticReflect}(I, E_{<t}, H_{t-1}) & \text{if } s_{t-1}=0\ \text{or}\ 50 \leq s_{t-1} \leq 99 \ H_{t-1} & \text{if } 1 \leq s_{t-1} \leq 49 \ \text{Terminate} & \text{if } s_{t-1}=100 \end{cases}

Enabling co-adaptive reasoning with minimal compute overhead.

  • Multi-Agent Coordination: Cooperative frameworks select plan designers and evaluators, aggregate plan proposals, and trigger consensus-based adaptation in response to environment events (Liu et al., 2024).

4. Applications Across Domains

Plan–Act dual agent architectures have been deployed in a breadth of domains:

  • Human–Robot Interaction (HRI): Frameworks achieve situational agreement and interaction timing by leveraging LLMs for planning and controllers for real-time execution, incorporating role-adaptive action text and stimuli-driven switching (Sasabuchi et al., 1 Apr 2025).
  • Web Navigation and Information Fusion: Multi-stage architectures employ cognitive duality (System 1/2), co-adaptive reflection, and hierarchical prompting for efficient, complex web tasks. Modular interaction loops and memory architectures enable generalization and dynamic replanning in long-horizon tasks (Liu et al., 7 Aug 2025, Erdogan et al., 12 Mar 2025, Hassouna et al., 2024).
  • Multimodal Video Intelligence: Planners generate structured video processing plans, and executors interact with modular tool servers. Hierarchical memory supports persistent goal tracking and personalization, with traceable inter-agent communication (Liang et al., 11 Nov 2025).
  • Vision-and-Language Navigation: Dual-scale graph transformers combine global (Plan) and local (Act) reasoning, dynamically fusing coarse map-based planning and fine-grained local grounding, resulting in state-of-the-art performance across multiple VLN benchmarks (Chen et al., 2022).
  • Multi-Agent Cooperation and Robotics: Meta-plan generation and progress-adaptive execution phases optimize task allocation and reduce redundant actions in collaborative environments, with dynamic feedback protocols and plan libraries for reuse (Liu et al., 2024, Rens et al., 2016).

5. Performance, Evaluation, and Empirical Insights

Plan–Act architectures deliver significant empirical gains in both success rate and efficiency, with careful trade-offs in compute, latency, and adaptivity.

  • Quantitative Results:
    • HRI: 90% success in situational engagement, with action text and second-stage timing questions critical for avoiding infinite waits (Sasabuchi et al., 1 Apr 2025).
    • Web Navigation/WebArena: CogniWeb achieves 43.96% SR with 75% token reduction over pure planner; CoAct yields 40% relative SR lift over baselines (Liu et al., 7 Aug 2025, Hou et al., 2024).
    • Multi-Agent Cooperation: CaPo boosts transport rate to 84.5% vs. 72.5% baseline; full plan plus progress adaptation ablation shows superior efficiency (Liu et al., 2024).
    • LegalAgentBench: PoAct improves success rate by +25 pp over ReAct, with ~45× reduction in token use (Yuan et al., 13 Jan 2025).
    • Reflective Dual-Strategy: DuSAR more than doubles previous SOTA success rates (ALFWorld: 13.0% → 37.1%) and achieves 3–9× token efficiency (Zhang et al., 9 Dec 2025).
  • Modularity and Extendability: LLM-Agent-UMF supports multi-core setups with formal risk analysis and modular interfaces for planning, memory, action, and security (Hassouna et al., 2024).
  • Limitations and Open Directions: Planning quality remains coupled to LLM strength; latency from repeated calls may be high; handling non-stationarity and persistent memory for complex strategies is still being explored (Liu et al., 2024, Hou et al., 2024).

6. Advanced Variants and Hybrid Systems

Several extensions and advanced formats are recognized:

  • Integrated Operational Models: Architectures unify planner and actor models (hierarchical methods, task refinements) to avoid inconsistency and support Monte Carlo Tree Search within operational context (Patra et al., 2020).
  • Distilled Dual Policy Networks: Training a separate distilled policy for planning yields faster, more stable, and exploratory planning with improved handling of non-simulatable reflexes, reducing variance and improving generalization (Yoo et al., 2023).
  • Dynamic Compute Allocation: RL agents trained to learn when to allocate planning compute (rather than always/never planning) outperform fixed strategies in sample efficiency and final achievement rate (Paglieri et al., 3 Sep 2025).
  • Scalable Synthetic Data Generation: Plan–Act agents enhance plan generation via synthetic trajectory annotation and plan expansion, resulting in substantial improvement on long-horizon text or web tasks (Erdogan et al., 12 Mar 2025).

7. Contextual Significance and Future Directions

The enduring significance of Plan–Act dual agent architectures lies in their ability to balance deliberation, reactivity, and resource efficiency. The explicit separation and dynamic fusion of planning and acting, informed by domain-specific metrics, hierarchical memory, and robust feedback, underpin scalable solutions for multimodal reasoning, autonomous control, and collaborative execution. Ongoing research focuses on tighter integration of reflective mechanisms, more sophisticated switching policies, and the unification of planning–acting models for complex, uncertain, or adversarial environments.

As planning models, interaction protocols, and multi-core agent standards evolve, Plan–Act dual architectures are poised to remain central in the design of next-generation intelligent agents for diverse real-world tasks—spanning robotics, web intelligence, video understanding, and multi-agent cooperation.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Plan–Act Dual Agent Architectures.