Proactive Hierarchical Planning in AI
- Proactive Hierarchical Planning is an advanced AI framework that interleaves high-level policy decomposition with continuous plan refinement for robust long-horizon task execution.
- It employs modular components such as global guidance, Manager and Worker policies, and latent strategy discovery to adapt to dynamic environmental changes.
- Empirical evaluations in GUI automation, embodied agents, and dialogue systems show significant improvements in success rates and adaptive behavior compared to reactive planners.
Proactive Hierarchical Planning (PHP) is an advanced planning paradigm for AI agents tasked with solving complex, multi-step problems in partially observable environments. PHP distinguishes itself by interleaving high-level policy decomposition with continuous mid-execution plan refinement, thereby increasing agent robustness to environment changes, long-horizon reasoning, and ambiguous observational feedback. Unlike reactive or static hierarchical planners, PHP proactively updates its plan in response to every significant environmental transition, whether or not a subgoal has failed, enabling adaptive behavior across a broad spectrum of domains ranging from GUI automation to proactive dialogue systems (Li et al., 26 Aug 2025, Agashe et al., 1 Apr 2025, He et al., 2024).
1. Problem Formulation and Motivation
PHP emerges as a solution to the limitations of standard hierarchical agents in settings where both long-range reasoning and closed-loop adaptability are required. In a typical POMDP formalism —where is the (hidden) world state, are observable outputs, the agent actions, the state/action transition model, and the reward function—traditional planners often fix a top-level plan up front and only replan in response to subgoal failure. This reactive structure limits adaptability in digital environments with unpredictable UI changes, ambiguous signals, or strategic, multi-turn interactions.
Proactive Hierarchical Planning, by contrast, reconceptualizes the agent's planning state to include not only the instruction and environment observation but the complete execution history or trajectory prefix. This results in a dynamic process where high-level strategies and low-level actions are continually refined in anticipation of, and in response to, evolving context (Li et al., 26 Aug 2025, Agashe et al., 1 Apr 2025, He et al., 2024).
2. Structural Components and Algorithms
Distinct PHP frameworks decompose planning into multiple interconnected modules, typically manifest as a hierarchy of global and local guidance, or management and execution agents.
Global Guidance and Milestone Decomposition
In HiPlan (Li et al., 26 Aug 2025), global guidance is formalized as a function
which takes a natural-language task and outputs an ordered sequence of milestones. Milestone generation uses an embedding-based retrieval over an expert-annotated offline library, followed by LLM adaptation to the specifics of .
Hierarchical Policies in POMDPs
Agent S2 (Agashe et al., 1 Apr 2025) operationalizes PHP within a two-level LLM-driven hierarchy:
- The Manager policy consumes the instruction , latest observation , and historical record to generate a subgoal list .
- The Worker policy executes a given subgoal in the context of , producing primitive action/descriptions for grounding via specialist modules (e.g., visual, textual, structural experts).
After every subgoal execution, regardless of success or failure, the Manager is re-invoked with updated context to regenerate the subgoal stack—ensuring plan agility and anticipatory corrections.
Latent Policy Planning for Dialogue
In LDPP (He et al., 2024), PHP is instantiated as a two-level MDP for proactive dialogues:
- High-level: Policy planner selects a latent vector encoding conversational strategy.
- Low-level: A generator produces token sequences conditioned on history and strategy . Latent strategies are discovered via a VQ-VAE style model directly from real-world dialogue logs, obviating reliance on hand-crafted policies or simulations.
3. Algorithmic Procedures and Pseudocode
All PHP systems implement a feedback-rich planning loop in which planning, action, observation, and plan revision are tightly coupled. Pseudocode below captures the core PHP cycle as exemplified in Agent S2 (Agashe et al., 1 Apr 2025):
1 2 3 4 5 6 7 8 9 10 |
def AgentS2_Run(I, o0): H = [] o = o0 G = Manager(I, o, H) # π_M while not empty(G): subgoal = G.pop_front() outcome, o = Worker_Execute(subgoal, o) H.append((subgoal, outcome)) G = Manager(I, o, H) # always re-plan proactively return SUCCESS |
4. Practical Architectures and Guidance Mechanisms
A defining feature of PHP is the integration of retrieved experiential fragments and local context into the planning cycle:
| Framework | High-Level Planner Input | Low-Level Executor Guidance |
|---|---|---|
| HiPlan (Li et al., 26 Aug 2025) | Task, milestone embeddings, history | Step-wise hints via LLM adaptation to current context/milestone |
| Agent S2 (Agashe et al., 1 Apr 2025) | Instruction, screenshot, subgoal history | Grounding experts for atomic actions (visual/text/structural) |
| LDPP (He et al., 2024) | Dialogue history, pseudo-latent labels | Frozen LLM + P-Former, conditioned on high-level latent policy |
HiPlan introduces a dual sequence of guidance: (1) "milestone guides" for macroscopic direction, and (2) dynamic, step-wise hints adapted from the nearest retrieved trajectory segments to address deviation and recovery. Agent S2 leverages a Mixture-of-Grounding mechanism, allowing the Worker to invoke the most appropriate specialized expert at each atomic step, under explicit high-level plan revision after every subgoal boundary.
5. Empirical Evaluation and Impact
Extensive evaluation across computer use, embodied agent, and dialogue domains demonstrates the effectiveness of PHP.
In Agent S2 (Agashe et al., 1 Apr 2025), ablation studies show that proactive replanning accounts for a 4.6 percentage point increase in success rate at 15 steps and a 6.1 point increase at 50 steps on the OSWorld benchmark compared to reactive planners. With the Mixture-of-Grounding module, the full system achieves 34.85% and 44.59% at 15-step and 50-step, respectively, compared to 27.69% and 33.85% for the baseline. Agent S2 also delivered 52.8% relative improvement over previous methods on WindowsAgentArena and 16.52% on AndroidWorld.
LDPP (He et al., 2024) achieves state-of-the-art results in ExTES (emotional support) dialogues, with a Soft Success Rate (SSR) of 0.723 and Success Rate (SR) of 0.903, outperforming both predefined-policy (PPDPP, SR = 0.558) and no-policy (ChatGPT, SR = 0.810) approaches. LDPP generalizes zero-shot to other datasets and settings such as persuasion-for-good (P4G).
6. Distinctive Design Principles and Extensions
Key insights behind the effectiveness of PHP include:
- Continuous Re-contextualization: The high-level planner receives the latest environment state and execution history at every replanning event, enabling rapid adaptation to emergent changes (e.g., pop-ups, misnavigation, partial progress).
- Self-correction and Exploration: By updating the plan not only after failures but after every subgoal, agents can anticipate or rectify errors before catastrophic deviation, manifesting behaviors such as adaptive navigation and backward correction (Agashe et al., 1 Apr 2025).
- Structural Modularity: Separation between high-level goal decomposition and low-level execution allows for targeted improvements and transparent delegation to specialist modules.
- Simulation-free Policy Discovery: In dialogue systems, latent strategies are mined directly from real data using VQ-VAE embeddings, removing the need for domain simulation and handcrafted scripting (He et al., 2024).
- Generalizability and Explainability: PHP architectures can be extended to multi-agent cases, dynamic codebooks, and enhanced with explainable policy-to-text modules.
7. Domain Coverage and Future Directions
PHP is established across diverse proactive domains: GUI task automation, embodied manipulation, and open-domain dialogue. It is especially suited to environments characterized by partial observability, unanticipated state perturbations, long horizon goals, and the need for coordinated multi-level policy control. Extensions are under investigation in negotiation, recommendation, tutoring, and settings with multi-agent interactive planning, as well as integration with online fine-tuning or user feedback as new data arrives (Agashe et al., 1 Apr 2025, He et al., 2024).
In summary, Proactive Hierarchical Planning provides a robust, modular, and empirically validated framework for leveraging hierarchical decomposition, continuous adaptation, and dynamic, context-sensitive execution to enable AI agents to effectively handle complex, evolving, and long-duration tasks in real-world scenarios.