Hierarchical Procedural Memory

Updated 29 December 2025

Hierarchical procedural memory is a multi-level organization of procedural knowledge that encodes how-to skills, plans, and sub-actions for efficient retrieval.
It employs dynamic trees, bipartite graphs, and multi-tiered architectures to segment tasks into granular and abstract components, enhancing both precision and generalization.
These structures improve long-term skill retention and transfer in advanced agent systems by facilitating rapid, interpretable retrieval and sample-efficient adaptation.

Hierarchical procedural memory refers to a structured external or neural mechanism designed to encode, organize, retrieve, and evolve procedural knowledge—“how-to” skills, plans, and operator sequences—into multiple abstraction levels. In advanced agent systems, especially those based on LLMs, such memory is essential for supporting efficient long-term skill retention, rapid retrieval, sample-efficient generalization, and interpretable reasoning. Contemporary approaches formalize this memory as dynamic trees, bipartite graphs, composite key-value architectures, or multi-tiered graphs, with fine-grained units at the leaves (atomic steps, action trajectories) and progressively abstracted schemas, scripts, or insights at higher levels. These systems are distinct from monolithic or flat vector memories, allowing agents or agent collectives to flexibly leverage both specific and generalized procedural knowledge.

1. Formal Definitions and Architectures

Hierarchical procedural memory structures are characterized by layered organization, where procedural information is encoded at fine-to-coarse levels of abstraction. Prominent instantiations include:

Dynamic Memory Trees (MemTree): Every node $v$ in the tree $T=(V,E)$ encapsulates $(c_v, e_v, p_v, \mathcal{C}_v, d_v)$ , i.e., aggregated content, semantic embedding, parent pointer, set of children, and abstraction depth. Leaves store the smallest procedural units; internal nodes aggregate and abstract their children. The root represents the highest-level summary. As one ascends the tree, stored content transitions from granular steps to procedural schemas (Rezazadeh et al., 17 Oct 2024).
Bipartite or Two-Level Architectures: Systems such as Memᵖ and H $^2$ R use a bipartite or modular hierarchy: low-level memory $M^\ell$ containing concrete trajectories or sub-trajectories, and high-level memory $M^h$ (scripts, plans, or task decompositions) distilled from $M^\ell$ . Each fine-grained episode links to an abstraction, supporting both example-based and schema-based transfer (Fang et al., 8 Aug 2025, Ye et al., 16 Sep 2025).
Meta-Procedures and Control Policies: MACLA further defines atomic procedures as 4-tuples ( $\langle$ goal signature, preconditions, action sketch, postconditions $\rangle$ ) and meta-procedures as compositions with explicit control mappings, supporting conditional and compositional abstraction (Forouzandeh et al., 22 Dec 2025).
Multi-tiered Graphs: In multi-agent systems, G-Memory encodes collaboration experience via a three-tier graph—interaction graphs (utterance-level procedural logs), query graphs (episodic index), and insight graphs (distilled high-level strategies) interconnected through query nodes (Zhang et al., 9 Jun 2025).
Binary Tree Memory (HAM): Hierarchical Attentive Memory (HAM) arranges memory as a binary tree; each internal node summarizes its children, while the LSTM controller navigates $\mathcal{O}(\log n)$ tree paths for efficient procedural manipulation and retrieval (Andrychowicz et al., 2016).

2. Mechanisms for Build, Update, and Synchronization

All hierarchical procedural memory systems share pipelines for construction, incremental update, and maintenance of the memory structure:

Build: Observed successful (and failed) trajectories are segmented into atomic actions/subgoals via semantic abstraction or LLM-based segmentation, then summarized, abstracted, and recorded at corresponding memory levels. For Memᵖ and H $^2$ R, this involves extracting high-level scripts and partitioned sub-trajectories per task (Fang et al., 8 Aug 2025, Ye et al., 16 Sep 2025). MemTree inserts new nodes via a similarity-guided recursive procedure, merging or expanding as dictated by semantic proximity (Rezazadeh et al., 17 Oct 2024).
Update: Incorporating new data involves reflexion to revise, deprecate, or merge existing entries based on recent performance. Update strategies include strict validation (add only when success), reflexive correction after observed failure, or batch-based merging; hierarchical maintenance ensures that changes in fine-grained trajectories propagate to script-level summaries and vice versa (Fang et al., 8 Aug 2025). In G-Memory, execution of new queries appends new graphs, augments semantic relations, and distills new insight nodes (Zhang et al., 9 Jun 2025).
Synchronization: Entries are linked and maintained per strict linkage rules—e.g., whenever a script is revised, all associated trajectories inherit the update. Bipartite maintenance is enforced so that hierarchy remains internally consistent as contents evolve (Fang et al., 8 Aug 2025).

3. Retrieval, Scoring, and Utilization

Hierarchical structure enables agents to retrieve both specific and generalized procedural knowledge:

Cosine Similarity-Based Retrieval: Across nearly all works, retrieval is organized via embedding-based similarity: query or context is embedded, and the top- $k$ keys (high-level or low-level) with highest similarity are selected (Rezazadeh et al., 17 Oct 2024, Fang et al., 8 Aug 2025, Ye et al., 16 Sep 2025, Forouzandeh et al., 22 Dec 2025). Some systems implement FAISS or other fast indexers to accelerate this step.
Hierarchical or Modular Retrieval Paths: For a novel problem, hierarchical procedural memory allows coarse-to-fine retrieval—first, a suitable script or meta-procedure is fetched (e.g., a broad solution to a task); then, corresponding fine-grained exemplars or action sequences are retrieved as conditionals or instantiations. The decoupling of high-level planning and low-level execution enables modular transfer and fine-grained matching (Fang et al., 8 Aug 2025, Ye et al., 16 Sep 2025, Forouzandeh et al., 22 Dec 2025).
Scoring and Selection: Systems like MACLA augment vanilla retrieval with Bayesian reliability tracking—each procedure has posterior success probability tracked using Beta distributions, and selection is framed as maximizing expected utility (incorporating relevance, risk, and information gain) (Forouzandeh et al., 22 Dec 2025).
Bi-directional and Role-Specific Retrieval: G-Memory supports upward traversal (extracting relevant schemas or insights across trials) and downward traversal (fetching task-matching procedural fragments), with agent-role filtering for multi-agent specialization (Zhang et al., 9 Jun 2025).

4. Abstraction: Schema Formation and Refinement

Hierarchical procedural memory supports the abstraction and continual refinement of procedural knowledge:

Aggregation and Summarization: Internal nodes (in MemTree, HAM) or high-level scripts (in Memᵖ, H $^2$ R) are continually updated to reflect higher-level schemas—summarizing their children either by aggregation of textual content or averaging embeddings (Rezazadeh et al., 17 Oct 2024, Andrychowicz et al., 2016).
Meta-procedures and Composition: MACLA demonstrates explicit modeling of meta-procedures, which encapsulate not just static plans but dynamically conditioned compositions of atomic procedures, governed by symbolic preconditions and control flows (Forouzandeh et al., 22 Dec 2025).
Contrastive and Reflexive Refinement: Procedures are refined by contrasting contexts of success and failure—enabling the expansion of context-specific preconditions, the correction of common failure paths, and the distillation of abstract insights (Forouzandeh et al., 22 Dec 2025, Ye et al., 16 Sep 2025). In H $^2$ R, LLM-based prompts guide the reflective curation of both high- and low-level insights.
Multi-Agent Schema Distillation: In collective memory architectures, high-level insight graphs distill strategies that generalize across many episodic trials, enabling systems to abstract transferable knowledge from collaborative procedural logs (Zhang et al., 9 Jun 2025).

5. Computational and Empirical Properties

Hierarchical procedural memory exhibits distinct computational and generalization benefits:

System	Retrieval Complexity	Insertion/Update	Generalization/Transfer	Key Empirical Result
MemTree (Rezazadeh et al., 17 Oct 2024)	$\mathcal{O}(N)$ (collapsed list)	$\mathcal{O}(\log N)$ if balanced	Improved long-term multi-session chat, QA	$+$ 2% accuracy over flat memory; QuALITY: $56.5\%$ vs $41.0\%$
MACLA (Forouzandeh et al., 22 Dec 2025)	$\approx \mathcal{O}(N)$ (ANN index)	$\mathcal{O}(1)$ per procedure	90.3% ALFWorld unseen, +15:1 compression	2,851 traj. $\rightarrow$ 187 procedures in 56s, $+$ 3.1% generalization
H $^2$ R (Ye et al., 16 Sep 2025)	$\mathcal{O}(k)$ per tier	LLM-driven symbolic+embed	Decoupled planning/execution improves transfer	AlfWorld: $75.9\%$ vs Expel $72.4\%$ ; PDDLGame: $80.5\%$ vs $72.2\%$
Memᵖ (Fang et al., 8 Aug 2025)	2-stage, key-based	Build/Reflexion, hierarchy sync	Reusable across LLMs, sample-efficient	ALFWorld: $77.86\%$ vs $42.14\%$ (no memory); transfer: +5% to Qwen2.5-14B
HAM (Andrychowicz et al., 2016)	$\mathcal{O}(\log n)$	$\mathcal{O}(\log n)$ per op.	Efficient algorithm learning	Sorting $n$ numbers: $\mathcal{O}(n\log n)$ time; perfect extrapolation
G-Memory (Zhang et al., 9 Jun 2025)	Graph traversal, $k$ -hop expand	Graph update + LLM reflection	Role-aware, cross-agent schema transfer	ALFWorld: $+$ 20.89\% absolute; QA: $+$ 10.12\% accuracy per graph memory

Hierarchical memory allows compression (2851 $\rightarrow$ 187 procedures in MACLA), scalable retrieval (via log-time tree or fast key indexing), and significantly increased sample efficiency (e.g., MACLA is 2800 $\times$ faster than parameter-training baselines). Hierarchical abstractions support robust transfer—procedural memory produced by a stronger model can be reused by a weaker one with absolute gains of 5–35% in various settings (Fang et al., 8 Aug 2025).

6. Applications, Multi-Agent Design, and Experimental Impact

Hierarchical procedural memory is deployed across a range of single-agent and multi-agent language-based reasoning systems:

LLM Agents: Full decoupling of learned procedures from LLM weights allows sample-efficient adaptation, rapid insertion, and continual refinement without model retraining (Forouzandeh et al., 22 Dec 2025).
Multi-Task and Novel Task Transfer: Separation of high-level planning and low-level execution enables agents to efficiently transfer abstract schemas to novel environments, minimizing interference (Ye et al., 16 Sep 2025).
Multi-Agent Systems (MAS): G-Memory extends the paradigm to multi-agent collaboration, maintaining individualized, role-focused procedural/insight memory and tracing concrete interaction graphs across agents (Zhang et al., 9 Jun 2025).
Classic Algorithm Learning: HAM demonstrates that hierarchical attention enables neural systems to perform classic algorithmic procedures (sort, merge, search) in optimal time, generalizing to larger scales than encountered during training (Andrychowicz et al., 2016).
Practical Impact: Across domains (AlfWorld, WebShop, PDDL planning, document QA), hierarchical procedural memory consistently yields superior generalization, faster solution steps, higher success rates, and improved context/token efficiency over flat episodic or non-hierarchical baselines (Rezazadeh et al., 17 Oct 2024, Fang et al., 8 Aug 2025, Ye et al., 16 Sep 2025, Zhang et al., 9 Jun 2025).

7. Limitations and Future Directions

Despite their empirical benefits, several limitations are noted:

All current systems rely on LLM-centric segmentation and summarization for procedural abstraction, and LLMs may induce biases or errors in reflection, abstraction, and update steps.
The complexity of memory updates and retrievals scales with the branching factor and size of stored memory, necessitating efficient key management and summarization strategies (Rezazadeh et al., 17 Oct 2024).
Optimal capacity and saturation points exist, beyond which additional procedural memory yields diminishing returns due to redundancy or spurious generalization (Forouzandeh et al., 22 Dec 2025).
Current architectures often assume clear subgoal boundaries and explicit success/failure feedback, which may not generalize to all open-world agent settings.

A plausible implication is that future hierarchical procedural memory designs will focus on: tighter integration of symbolic and distributional representations, principled capacity management, automated schema discovery, continual agent-team evolution, and further formalization of abstraction/refinement mechanisms to close the gap between artificial and human procedural competence.