Experience-Augmented Hierarchical Planning

Updated 12 May 2026

Experience-Augmented Hierarchical Planning is a framework that decomposes complex tasks into hierarchical layers while integrating past execution data, learned models, and on-the-fly adaptations.
It employs mechanisms such as online acting experience, intrinsic skill discovery, and automatic symbol extraction to convert low-level actions into high-level, reusable abstractions.
Empirical evaluations demonstrate significant planning speedups and improved task success in robotics, multi-agent systems, and hybrid AI architectures.

Experience-Augmented Hierarchical Planning (EAHP) is a family of techniques and algorithms that fuse hierarchical task decomposition with mechanisms for leveraging accumulated experience to optimize planning and execution in complex, often uncertain environments. This paradigm is central to a range of modern AI systems, including robotics, reinforcement learning (RL), neuro-symbolic architectures, and human-in-the-loop planners. It has been instantiated in operational models, planning from learned skills, automatic symbol abstraction, and real-world systems with mixed discrete and continuous action spaces.

1. Formal Foundations and Hierarchical Frameworks

Experience-augmented hierarchical planning architectures are defined by a multi-level decomposition of tasks and the systematic incorporation of experience—such as past executions, learned models, and on-the-fly adaptation—into action selection and plan synthesis.

Hierarchical Operational Models

Several frameworks formalize the planning domain as a tuple $(\Xi, T, M, A)$ , where:

$\Xi$ : set of world states (total assignment to state variables)
$T$ : set of tasks/events (ground predicates)
$M$ : finite set of hierarchical refinement methods
$A$ : set of primitive actions

A refinement method $m = (\mathrm{role}, \mathrm{precond}, \mathrm{body})$ handles a specific task type, guarded by a precondition and executed via a body that admits rich control flow (loops, tests, primitive calls, subtask invocation). The acting system maintains a stack of refinement frames—each corresponding to an active task and its current execution context—thereby allowing deep, recursive task decomposition (Patra et al., 2020).

Symbolic, Skill-Based, and Option-Based Hierarchies

Other research grounds the hierarchy in:

Symbolic abstractions (e.g., action schemas, fluents, PDDL), with experience feeding back as learned costs, preconditions, or effects (Yang et al., 2018)
Option-based hierarchies, where temporally-extended skills are discovered and refined from experience, forming a basis for planning operators and automated abstraction (Oddi et al., 2019, Morere et al., 2019)
Natural-programming hierarchies, in which user-provided or system-discovered decompositions are recursively extended and stored as experience for subsequent planning (Cano et al., 2023)

These frameworks establish the structural and representational prerequisites for hierarchical planning with experience augmentation.

2. Experience Acquisition and Abstraction Mechanisms

The core of EAHP is the acquisition and integration of experience to improve future planning, through mechanisms including:

Online and Simulated Acting Experience

Operational models are augmented by collecting traces of acting and planning. Monte Carlo Tree Search (MCTS)-like planners (notably UPOM) run rollouts using the operational models, then assign “credit” (utility or efficiency) to method choices. This data is stored as supervised pairs (context, method) or (context, utility), and used to train policies or heuristic predictors (Patra et al., 2020).

Intrinsically Motivated Skill Discovery

Systems such as M-GRAIL autonomously discover and train options for environmental events, leveraging intrinsic measures such as competence improvement or novelty. Acquired options—each encapsulating a goal predicate, initiation set, policy, and termination condition—serve as the building blocks for symbolic action abstraction (Oddi et al., 2019).

Automatic Symbol and Operator Extraction

From executed options and their observed effects, experience is transformed into high-level symbolic representations suitable for classical planners. This is achieved via classifiers that learn the initiation and effect sets, factorization of state space into submanifolds, and operator generation by aligning symbolic predicates with option transitions. Critically, robust classifiers (such as Intersection+Mask vs. decision trees) dramatically affect abstraction quality and downstream planning completeness (Oddi et al., 2019).

Curriculum-Based Skill Abstraction

Hierarchically compositional skills are learned sequentially through curriculum goals of increasing complexity. Each successful trajectory is partitioned into effects, preconditions, and policies, resulting in new skills that are recursively reused for deeper problems. Condition detectors are learned via statistical models (e.g., GMMs) over observed success states (Morere et al., 2019).

Human-Driven Experience Libraries

Natural programming approaches build an experience library wherein each entry is a tuple (goal, linguistic hint, decomposition into subgoals). These libraries are expanded via recursive planning searches and updated as successful decompositions are found and executed, supporting continual and cross-contextual learning (Cano et al., 2023).

3. Planning Algorithms and Execution Strategies

Experience-augmented hierarchical planners combine deliberative search, experience-guided policies, and anytime algorithms:

Anytime UCT-Based Search Over Operational Models

UPOM (UCT for Plans Over Models) performs MCTS in the refinement tree of possible decompositions, using utility estimates acquired from past rollouts to balance exploration and exploitation. The planner is parameterized by rollout budget and depth, trading off planning effort for solution quality (Patra et al., 2020).

Policy and Heuristic Learning

Learned policies for method selection (Learn $_\pi$ ) and heuristics for node evaluation (Learn $_H$ ) are trained as multi-layer perceptrons or classifiers on logged experience. These components allow the planner to default to a fast policy in low-latency scenarios, or to run guided search with learned heuristics when time permits, forming an "anytime" spectrum between reactive acting and full deliberative search (Patra et al., 2020).

Symbolic–RL Integration and Cost Annealing

Hybrid systems such as PEORL alternate between symbolic planning (which uses learned experience as costs or constraints) and hierarchical RL (which learns intra-option and option-level value functions). After each run, experience is fed back to the planner as updated cost annotations, biasing subsequent plan synthesis towards higher reward and robustness (Yang et al., 2018).

Experience Informed Search in Continuous and Discrete Spaces

In robotic motion planning, abstraction layers generated from offline experience (e.g., critical region prediction via deep networks) structure the search space into high-level waypoints and transitions. Experience-augmented search maintains and updates transition costs based on success in local refinements, dynamically biasing the planner toward feasible and efficient paths (Shah et al., 2022, Asselmeier et al., 2023).

4. Empirical Evaluations and Theoretical Guarantees

EAHP frameworks have been extensively empirically validated in multi-robot, manipulation, navigation, and synthetic grid domains.

System	Key Setting	Success Metric	Planning Speedup/Benefit
RAE+UPOM	Multi-robot tasks	Success ratio ↑	10× faster planning vs. unguided
GRAIL+PDDL-Gen	Humanoid 6-bulb	Completeness ↑	>95% state-coverage with IntM
HARP	Robot navigation	Task success	≈10× faster than uniform sampling
PEORL	Taxi, GridWorld	Cumulative reward	Optimal plans, 70% fewer re-plans
Curriculum RL	Crafting, Baking	Plan time, length	2× shorter plans, milliseconds per plan
Natural Prog	CraftLite	Items per session	6.2× items/generation growth slope

Theoretical assertions include:

UPOM is asymptotically optimal as rollout depth/width → ∞ in static domains (Patra et al., 2020).
HARP is sound (holonomic) and probabilistically complete if abstraction refinement maintains support (Shah et al., 2022).
PEORL inherits convergence guarantees from R-learning at the primitive and option level, under full observability (Yang et al., 2018).
Skill- and condition-learning schemes converge under sparse, feature-disentangled state assumptions (Morere et al., 2019).

5. Integrations and Variants in Modern AI Architectures

Recent work extends EAHP to multitask, neuro-symbolic, and LLM-augmented domains:

Diffusion-based planners employ hierarchical diffusion models to capture temporal abstraction, using high-level “jumpy” planners guided by learned subgoal distributions, with empirical evidence for improved sample efficiency and compositional generalization (Chen et al., 2024).
Multi-agent recommender systems employ distilled “thought patterns” derived from agent and human experiences to structure and adapt planning for natural language queries, with each new experience prompting pattern distillation and insertion (Yu et al., 30 Jun 2025).
LLM-augmented hierarchical agents leverage in-context LLMs as priors over high-level skill selection, acting in a semi-Markov framework and dramatically accelerating policy learning in long-horizon tasks (Prakash et al., 2023).
Hierarchical planners in robotics bootstrap abstractions directly from prior low-level experiences, replacing expert-crafted state-action abstractions and supporting rapid multi-source, bi-directional search (Shah et al., 2022).
Formal symbolic validators paired with knowledge graphs and LLM-based macro- and micro-planners provide neuro-symbolic pipelines with up-to-date (retrieved) world knowledge and explicit plan verification for robust task execution (Cornelio et al., 6 Apr 2025).

6. Limitations, Open Challenges, and Directions

EAHP approaches exhibit several trade-offs and open questions:

Robustness to non-stationarity and context transfer: Experience libraries and abstraction layers must adapt to changing dynamics or tasks; methods such as curriculum-based curriculum building or context transfer offer promising results (Cano et al., 2023).
Abstraction failures: Classifier choice critically affects the quality and completeness of symbol abstractions; compact yet complete models increase planning reliability (Oddi et al., 2019).
Real-time constraints: The efficacy of anytime/experience-guided planners hinges on the scheduling between fast, learned policies and deliberative planning modules (Patra et al., 2020).
Theoretical convergence: While flat RL components carry standard guarantees, convergence properties of hierarchical RL with online, experience-augmented operator generation remain incompletely characterized (Yang et al., 2018).
Human-in-the-loop systems: Curriculum design, hint quality, and decomposition granularity impact long-term performance and generalization for systems learning from human-provided experience (Cano et al., 2023, Consul et al., 2021).
Scalability: Supporting lifelong, scalable abstraction learning—especially with evolving goals, skill forgetting, and continuous parameter spaces—remains an active challenge (Morere et al., 2019).

7. Impact and Research Significance

Experience-augmented hierarchical planning has led to notable advances across domains:

Substantial speedups and increases in solution quality for robotic manipulation, motion planning, and multi-agent coordination.
Demonstrated transfer of skills and planners from simulation to hardware without further adaptation.
Robust handling of complex, high-dimensional planning problems where brute-force or non-hierarchical approaches are infeasible.
Empirical evidence of superhuman and transferable human decision support via principled metareasoning architectures (Consul et al., 2021).
Foundations for neuro-symbolic and LLM-augmented AI where experience is distilled across abstraction layers, yielding benefits in open-ended reasoning, planning, and explainability.

Experience-augmented hierarchical planning is thus a foundational framework for integrating structure and learning, supporting both principled analysis and practical deployment across the spectrum of advanced AI systems. Key exemplars of the paradigm include UPOM-guided operational models (Patra et al., 2020), experience-driven symbolic planners (Yang et al., 2018), robust abstractor pipelines (Oddi et al., 2019), curriculum skill bootstrapping (Morere et al., 2019), natural programming (Cano et al., 2023), and LLM-augmented agents (Prakash et al., 2023).