Papers
Topics
Authors
Recent
Search
2000 character limit reached

Proactive Hierarchical Planning in AI

Updated 16 March 2026
  • Proactive Hierarchical Planning is an advanced AI framework that interleaves high-level policy decomposition with continuous plan refinement for robust long-horizon task execution.
  • It employs modular components such as global guidance, Manager and Worker policies, and latent strategy discovery to adapt to dynamic environmental changes.
  • Empirical evaluations in GUI automation, embodied agents, and dialogue systems show significant improvements in success rates and adaptive behavior compared to reactive planners.

Proactive Hierarchical Planning (PHP) is an advanced planning paradigm for AI agents tasked with solving complex, multi-step problems in partially observable environments. PHP distinguishes itself by interleaving high-level policy decomposition with continuous mid-execution plan refinement, thereby increasing agent robustness to environment changes, long-horizon reasoning, and ambiguous observational feedback. Unlike reactive or static hierarchical planners, PHP proactively updates its plan in response to every significant environmental transition, whether or not a subgoal has failed, enabling adaptive behavior across a broad spectrum of domains ranging from GUI automation to proactive dialogue systems (Li et al., 26 Aug 2025, Agashe et al., 1 Apr 2025, He et al., 2024).

1. Problem Formulation and Motivation

PHP emerges as a solution to the limitations of standard hierarchical agents in settings where both long-range reasoning and closed-loop adaptability are required. In a typical POMDP formalism M=(S,O,A,T,R)\mathcal{M} = (\mathcal{S}, \mathcal{O}, \mathcal{A}, \mathcal{T}, \mathcal{R})—where S\mathcal{S} is the (hidden) world state, O\mathcal{O} are observable outputs, A\mathcal{A} the agent actions, T\mathcal{T} the state/action transition model, and R\mathcal{R} the reward function—traditional planners often fix a top-level plan up front and only replan in response to subgoal failure. This reactive structure limits adaptability in digital environments with unpredictable UI changes, ambiguous signals, or strategic, multi-turn interactions.

Proactive Hierarchical Planning, by contrast, reconceptualizes the agent's planning state to include not only the instruction and environment observation but the complete execution history or trajectory prefix. This results in a dynamic process where high-level strategies and low-level actions are continually refined in anticipation of, and in response to, evolving context (Li et al., 26 Aug 2025, Agashe et al., 1 Apr 2025, He et al., 2024).

2. Structural Components and Algorithms

Distinct PHP frameworks decompose planning into multiple interconnected modules, typically manifest as a hierarchy of global and local guidance, or management and execution agents.

Global Guidance and Milestone Decomposition

In HiPlan (Li et al., 26 Aug 2025), global guidance is formalized as a function

gglobal:T⟶Gτ={m1,m2,…,mK}g_{\mathrm{global}}:\mathcal{T} \longrightarrow \mathcal{G}_\tau = \{m_1, m_2, \dots, m_K\}

which takes a natural-language task Ï„\tau and outputs an ordered sequence of KK milestones. Milestone generation uses an embedding-based retrieval over an expert-annotated offline library, followed by LLM adaptation to the specifics of Ï„\tau.

Hierarchical Policies in POMDPs

Agent S2 (Agashe et al., 1 Apr 2025) operationalizes PHP within a two-level LLM-driven hierarchy:

  • The Manager policy Ï€M\pi_M consumes the instruction II, latest observation oto_t, and historical record HtH_t to generate a subgoal list [g1,...,gN][g_1, ..., g_N].
  • The Worker policy Ï€W\pi_W executes a given subgoal gig_i in the context of oo, producing primitive action/descriptions for grounding via specialist modules (e.g., visual, textual, structural experts).

After every subgoal execution, regardless of success or failure, the Manager is re-invoked with updated context to regenerate the subgoal stack—ensuring plan agility and anticipatory corrections.

Latent Policy Planning for Dialogue

In LDPP (He et al., 2024), PHP is instantiated as a two-level MDP for proactive dialogues:

  • High-level: Policy planner selects a latent vector ztz_t encoding conversational strategy.
  • Low-level: A generator produces token sequences conditioned on history hth_t and strategy ztz_t. Latent strategies are discovered via a VQ-VAE style model directly from real-world dialogue logs, obviating reliance on hand-crafted policies or simulations.

3. Algorithmic Procedures and Pseudocode

All PHP systems implement a feedback-rich planning loop in which planning, action, observation, and plan revision are tightly coupled. Pseudocode below captures the core PHP cycle as exemplified in Agent S2 (Agashe et al., 1 Apr 2025):

1
2
3
4
5
6
7
8
9
10
def AgentS2_Run(I, o0):
    H = []
    o = o0
    G = Manager(I, o, H)   # π_M
    while not empty(G):
        subgoal = G.pop_front()
        outcome, o = Worker_Execute(subgoal, o)
        H.append((subgoal, outcome))
        G = Manager(I, o, H)  # always re-plan proactively
    return SUCCESS
In HiPlan (Li et al., 26 Aug 2025), the main loop interleaves global milestone tracking with stepwise, context-adapted action selection, based on current observations, retrieved trajectory fragments, and real-time LLM adaptation. In LDPP (He et al., 2024), policy and generator modules are trained sequentially and then jointly via offline hierarchical RL on annotated dialogue trajectories.

4. Practical Architectures and Guidance Mechanisms

A defining feature of PHP is the integration of retrieved experiential fragments and local context into the planning cycle:

Framework High-Level Planner Input Low-Level Executor Guidance
HiPlan (Li et al., 26 Aug 2025) Task, milestone embeddings, history Step-wise hints via LLM adaptation to current context/milestone
Agent S2 (Agashe et al., 1 Apr 2025) Instruction, screenshot, subgoal history Grounding experts for atomic actions (visual/text/structural)
LDPP (He et al., 2024) Dialogue history, pseudo-latent labels Frozen LLM + P-Former, conditioned on high-level latent policy

HiPlan introduces a dual sequence of guidance: (1) "milestone guides" for macroscopic direction, and (2) dynamic, step-wise hints adapted from the nearest retrieved trajectory segments to address deviation and recovery. Agent S2 leverages a Mixture-of-Grounding mechanism, allowing the Worker to invoke the most appropriate specialized expert at each atomic step, under explicit high-level plan revision after every subgoal boundary.

5. Empirical Evaluation and Impact

Extensive evaluation across computer use, embodied agent, and dialogue domains demonstrates the effectiveness of PHP.

In Agent S2 (Agashe et al., 1 Apr 2025), ablation studies show that proactive replanning accounts for a 4.6 percentage point increase in success rate at 15 steps and a 6.1 point increase at 50 steps on the OSWorld benchmark compared to reactive planners. With the Mixture-of-Grounding module, the full system achieves 34.85% and 44.59% at 15-step and 50-step, respectively, compared to 27.69% and 33.85% for the baseline. Agent S2 also delivered 52.8% relative improvement over previous methods on WindowsAgentArena and 16.52% on AndroidWorld.

LDPP (He et al., 2024) achieves state-of-the-art results in ExTES (emotional support) dialogues, with a Soft Success Rate (SSR) of 0.723 and Success Rate (SR) of 0.903, outperforming both predefined-policy (PPDPP, SR = 0.558) and no-policy (ChatGPT, SR = 0.810) approaches. LDPP generalizes zero-shot to other datasets and settings such as persuasion-for-good (P4G).

6. Distinctive Design Principles and Extensions

Key insights behind the effectiveness of PHP include:

  • Continuous Re-contextualization: The high-level planner receives the latest environment state and execution history at every replanning event, enabling rapid adaptation to emergent changes (e.g., pop-ups, misnavigation, partial progress).
  • Self-correction and Exploration: By updating the plan not only after failures but after every subgoal, agents can anticipate or rectify errors before catastrophic deviation, manifesting behaviors such as adaptive navigation and backward correction (Agashe et al., 1 Apr 2025).
  • Structural Modularity: Separation between high-level goal decomposition and low-level execution allows for targeted improvements and transparent delegation to specialist modules.
  • Simulation-free Policy Discovery: In dialogue systems, latent strategies are mined directly from real data using VQ-VAE embeddings, removing the need for domain simulation and handcrafted scripting (He et al., 2024).
  • Generalizability and Explainability: PHP architectures can be extended to multi-agent cases, dynamic codebooks, and enhanced with explainable policy-to-text modules.

7. Domain Coverage and Future Directions

PHP is established across diverse proactive domains: GUI task automation, embodied manipulation, and open-domain dialogue. It is especially suited to environments characterized by partial observability, unanticipated state perturbations, long horizon goals, and the need for coordinated multi-level policy control. Extensions are under investigation in negotiation, recommendation, tutoring, and settings with multi-agent interactive planning, as well as integration with online fine-tuning or user feedback as new data arrives (Agashe et al., 1 Apr 2025, He et al., 2024).

In summary, Proactive Hierarchical Planning provides a robust, modular, and empirically validated framework for leveraging hierarchical decomposition, continuous adaptation, and dynamic, context-sensitive execution to enable AI agents to effectively handle complex, evolving, and long-duration tasks in real-world scenarios.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Proactive Hierarchical Planning (PHP).