MemTree: Hierarchical Memory Structures

Updated 23 March 2026

MemTree is a family of hierarchical memory architectures that structure data in tree forms to enable efficient storage, retrieval, and abstraction.
It includes implementations such as dynamic trees for LLMs, prefix tries for CTMC state storage, and Eigen Memory Trees for online sequential learning.
Empirical evaluations show MemTree improves scalability, reduces memory overhead, and speeds up retrieval compared to traditional flat or hash-based methods.

MemTree encompasses a set of algorithmic paradigms that use hierarchical or tree-structured memory representations to enable efficient storage, retrieval, and abstraction in settings ranging from LLMs to stochastic system exploration and online sequential learning. The MemTree concept has emerged independently across multiple research domains, each leveraging tree architectures to provide scalability, structured abstraction, or efficient access compared to traditional flat or hash-based data structures.

1. Formal Definitions and Core Data Structures

Three primary MemTree instantiations are established in the literature:

A. Hierarchical Dynamic Tree Memory for LLMs

MemTree (Rezazadeh et al., 2024) is defined as a rooted, directed tree $T=(V,E)$ where each node $v\in V$ holds a tuple $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ :

$c_v$ : aggregated textual content (string)
$e_v\in\mathbb{R}^d$ : semantic embedding of $c_v$
$p_v$ : parent node (root has $p_{v_0}=\mathrm{null}$ )
$\mathcal{C}_v$ : set of child nodes
$d_v\in\mathbb{N}$ : tree depth

Hierarchical abstraction ties depth $v\in V$ 0 to specificity: shallow nodes summarize coarse topics; deeper nodes encode fine-grained detail. The root is purely structural, holding no content or embedding.

B. Prefix Tree for State Storage in CTMCs

In explicit state storage for large-scale continuous-time Markov chains (CTMCs), MemTree is a prefix tree or trie (Taylor et al., 19 Dec 2025): $v\in V$ 1 with:

$v\in V$ 2: set of nodes
$v\in V$ 3: depth $v\in V$ 4
$v\in V$ 5: edges with labels $v\in V$ 6 (state variable values)
$v\in V$ 7: node corresponds to partial assignment $v\in V$ 8
$v\in V$ 9: nodes representing complete states

Each node’s children are indexed by the next state variable’s value; traversal from root to terminal uniquely identifies a state.

C. Eigen Memory Tree (EMT) for Online Learning

EMT (Rucker et al., 2022) is a full binary tree where:

Internal nodes $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 0 store a router (approximate principal component $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 1) and a split threshold (median projection value).
Leaves $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 2 store a memory buffer $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 3 up to capacity $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 4.
A global scorer $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 5 learns a parametric dissimilarity for retrieval.

Traversal decisions at internal nodes are based on projections $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 6 and tree structure is dynamically constructed online.

2. Insertion, Update, and Retrieval Algorithms

A. Dynamic Memory Update in LLM MemTree

Insertion of new content $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 7 involves:

Embedding computation: $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 8.
Recursive “InsertNode” starting at the root:
- At each node, compute cosine similarities $[c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]$ 9 between $c_v$ 0 and children’s $c_v$ 1.
- If $c_v$ 2 (depth-adaptive threshold), aggregate content via LLM-prompted summarization and proceed to $c_v$ 3.
- Otherwise, create new leaf for $c_v$ 4. The threshold $c_v$ 5 (with $c_v$ 6, $c_v$ 7 in practice).

Retrieval is performed by collapsed-tree scan: for query embedding $c_v$ 8, compute $c_v$ 9 for all $e_v\in\mathbb{R}^d$ 0, discarding results below $e_v\in\mathbb{R}^d$ 1 and returning top- $e_v\in\mathbb{R}^d$ 2 nodes.

B. Prefix Tree State Operations

Insertion for state $e_v\in\mathbb{R}^d$ 3:

At each level $e_v\in\mathbb{R}^d$ 4, follow or create edge labeled $e_v\in\mathbb{R}^d$ 5, progressing from $e_v\in\mathbb{R}^d$ 6 to terminal $e_v\in\mathbb{R}^d$ 7 (mark as terminal). Membership lookup and state extraction are likewise realized by deterministic tree traversal, yielding $e_v\in\mathbb{R}^d$ 8 time per operation.

C. Eigen Memory Tree Routing and Learning

Routing for feature $e_v\in\mathbb{R}^d$ 9:

At each internal node $c_v$ 0, compute projection $c_v$ 1; route left if $c_v$ 2, else right.
At leaf, select memory via scorer $c_v$ 3 that minimizes $c_v$ 4. Insertion appends $c_v$ 5 to the reached leaf; when capacity $c_v$ 6 is exceeded, a PCA-based split is triggered using incremental Oja’s method.

3. Complexity, Space Usage, and Optimization

MemTree Variant	Insertion/Lookup	Retrieval	Space	Key Optimizations
LLM MemTree (Rezazadeh et al., 2024)	$c_v$ 7 insertion avg	$c_v$ 8 flat scan	Hierarchical, depends on tree shape	Depth-adaptive $c_v$ 9, parallelizable aggregation
Prefix (Trie) (Taylor et al., 19 Dec 2025)	$p_v$ 0	$p_v$ 1	$p_v$ 2 worst, often better with high prefix sharing	BMC-based variable order for compactness
EMT (Rucker et al., 2022)	$p_v$ 3	$p_v$ 4	$p_v$ 5	Median splits, Oja’s PCA, learned scorer

In LLM MemTree, aggregation and embedding updates along the traversal path are parallelizable. Collapsed retrieval yields $p_v$ 6 cost, but tree traversal can theoretically achieve $p_v$ 7.
Prefix tree’s memory advantage arises when large state sets share long common prefixes; BMC-based variable ordering further tightens memory footprint by maximizing early sharing. Empirical savings are 45–70% vs. hash map for large biochemical CTMCs, with $p_v$ 8 time per operation (Taylor et al., 19 Dec 2025).
EMT’s binary tree yields $p_v$ 9 access for both reads and writes; splits and router updates are amortized by leaf capacity.

4. Evaluation Metrics and Empirical Results

A. LLM MemTree (Rezazadeh et al., 2024)

Performance was evaluated on:

Multi-Session Chat (MSC, $p_{v_0}=\mathrm{null}$ 015 turns) and MSC-Extended (200 turns)
QuALITY (5000-token QA, easy/hard distinctions)
MultiHop RAG (609 news, 2556 multi-hop queries)

Key metrics: binary accuracy (by GPT-4 judge) and ROUGE-L recall.

Select outcomes:

On MSC: MemoryStream 84.4% acc/79.1 R, MemTree 84.8% acc/79.9 R.
On MSC-E: Full history 78.0%, MemoryStream 80.7%, MemTree 82.5%.
On QuALITY: RAPTOR 59.0%, MemoryStream 43.8%, MemTree 59.8%.
On MultiHop: RAPTOR 81.0%, MemoryStream 74.7%, MemTree 80.5% (best on temporal queries).

Insertion overhead for full-dataset on MultiHop is $p_{v_0}=\mathrm{null}$ 110 s (MemTree) versus $p_{v_0}=\mathrm{null}$ 2 hr (RAPTOR/GraphRAG).

B. Prefix Trie for CTMCs (Taylor et al., 19 Dec 2025)

Empirical memory savings at scale (on state spaces up to $p_{v_0}=\mathrm{null}$ 3):

Up to 68% less memory vs. hash maps.
Per-operation CPU overhead is modest (seconds to minutes) given the scale.

C. EMT (Rucker et al., 2022)

On 206 OpenML contextual bandit datasets:

EMT outperforms CMT on 177/206 datasets.
EMT+Parametric hybrid (PEMT) beats pure parametric on 110 datasets, losing on only 8.
For bounded memory (as little as $p_{v_0}=\mathrm{null}$ 4), PEMT maintains $p_{v_0}=\mathrm{null}$ 5 mean reward loss vs. unbounded.

5. Comparative Strengths and Domain Limitations

A. LLM MemTree

Supports fully online updates at logarithmic cost, enabling incremental context management in extended dialogue/document scenarios.
Hierarchical schema-like abstraction aligns with human topical structure, enabling high-level and granular retrieval.
Outperforms flat memory approaches on long context and complex QA, approaching offline RAG systems for performance.
Limitations: Relies on extra LLM summarization calls at insertion ( $p_{v_0}=\mathrm{null}$ 63.27 per insertion on MultiHop), potential for retrieval of overly verbose or partially relevant passages, sensitivity to adaptive thresholding and summarization prompting.

B. Prefix Trie for CTMCs

Substantially reduces memory for explicit state storage, especially with high concurrency and shared prefixes.
Preprocessing (BMC) for variable order yields further savings but adds setup cost.
Trade-off: $p_{v_0}=\mathrm{null}$ 7 deterministic access vs. $p_{v_0}=\mathrm{null}$ 8 hash average, but $p_{v_0}=\mathrm{null}$ 9 (number of variables) is moderate in practice.

C. EMT

Efficient, self-consistent online memory, with provable $\mathcal{C}_v$ 0 access. Principal-component splits capture effective routing for many real-world tabular and bandit tasks.
Hybridization with parametric models yields “no-downside” performance gains.
Sensitivity to data regime: fixed routers may underperform in drifting distributions; Oja’s approximation requires well-behaved covariances; high-cardinality/sparse categorical features can challenge panel retrieval.

6. Cross-Domain Significance and Applicability

MemTree approaches unify a spectrum of requirements encountered in scalable memory augmentation:

In LLMs, they enable structured, schema-aligned context for conversational agents, avoiding the redundancy and inefficiency of flat memory repacking (Rezazadeh et al., 2024).
In model checking and systems biology, prefix trees make tractable the explicit storage and examination of massive discrete state spaces, previously bottlenecked by memory constraints of hash table methods (Taylor et al., 19 Dec 2025).
In online sequential learning, tree-structured memory supports both efficient lookup and learning-based generalization, providing a viable alternative to $\mathcal{C}_v$ 1-NN and streaming algorithms (Rucker et al., 2022).

The decisive empirical and theoretical properties across these domains are:

Structurally induced efficiency (via prefix/redundancy sharing),
Dynamic/online construction capability (no offline retraining),
Alignment with underlying semantic or state-structure of the problem,
Quantitative, domain-specific performance gains.

7. Open Problems and Future Directions

Further optimization of tree construction—especially adaptive variable order in prefix tries and threshold/summarization in LLM MemTree—may yield enhanced performance in novel or high-dimensional domains (Rezazadeh et al., 2024, Taylor et al., 19 Dec 2025).
Addressing the challenge of retrieval specificity versus verbosity in hierarchical memories remains pertinent, as does the integration of learned (differentiable) aggregation for abstracting node content.
In online learning contexts, development of incremental eviction/rebalancing strategies for nonstationary data streams presents an open problem (Rucker et al., 2022).
Broader generalization to hybrid symbolic-neural workflows and end-to-end differentiable controllers represents a plausible avenue for future extension.

MemTree thus constitutes a family of principled tree-based memory architectures, each providing domain-adaptive advances in memory scaling, abstraction, and retrieval compared to legacy methods, with substantial empirical validation across natural language, stochastic, and sequential learning contexts (Rezazadeh et al., 2024, Taylor et al., 19 Dec 2025, Rucker et al., 2022).

Markdown Report Issue Upgrade to Chat

References (3)

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs (2024)

Prefix Trees Improve Memory Consumption in Large-Scale Continuous-Time Stochastic Models (2025)

Eigen Memory Trees (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MemTree.