MemTree: Hierarchical Memory Structures
- MemTree is a family of hierarchical memory architectures that structure data in tree forms to enable efficient storage, retrieval, and abstraction.
- It includes implementations such as dynamic trees for LLMs, prefix tries for CTMC state storage, and Eigen Memory Trees for online sequential learning.
- Empirical evaluations show MemTree improves scalability, reduces memory overhead, and speeds up retrieval compared to traditional flat or hash-based methods.
MemTree encompasses a set of algorithmic paradigms that use hierarchical or tree-structured memory representations to enable efficient storage, retrieval, and abstraction in settings ranging from LLMs to stochastic system exploration and online sequential learning. The MemTree concept has emerged independently across multiple research domains, each leveraging tree architectures to provide scalability, structured abstraction, or efficient access compared to traditional flat or hash-based data structures.
1. Formal Definitions and Core Data Structures
Three primary MemTree instantiations are established in the literature:
A. Hierarchical Dynamic Tree Memory for LLMs
MemTree (Rezazadeh et al., 2024) is defined as a rooted, directed tree where each node holds a tuple :
- : aggregated textual content (string)
- : semantic embedding of
- : parent node (root has )
- : set of child nodes
- : tree depth
Hierarchical abstraction ties depth to specificity: shallow nodes summarize coarse topics; deeper nodes encode fine-grained detail. The root is purely structural, holding no content or embedding.
B. Prefix Tree for State Storage in CTMCs
In explicit state storage for large-scale continuous-time Markov chains (CTMCs), MemTree is a prefix tree or trie (Taylor et al., 19 Dec 2025): with:
- : set of nodes
- : depth $0$
- : edges with labels (state variable values)
- : node corresponds to partial assignment
- : nodes representing complete states
Each node’s children are indexed by the next state variable’s value; traversal from root to terminal uniquely identifies a state.
C. Eigen Memory Tree (EMT) for Online Learning
EMT (Rucker et al., 2022) is a full binary tree where:
- Internal nodes store a router (approximate principal component ) and a split threshold (median projection value).
- Leaves store a memory buffer up to capacity .
- A global scorer learns a parametric dissimilarity for retrieval.
Traversal decisions at internal nodes are based on projections and tree structure is dynamically constructed online.
2. Insertion, Update, and Retrieval Algorithms
A. Dynamic Memory Update in LLM MemTree
Insertion of new content involves:
- Embedding computation: .
- Recursive “InsertNode” starting at the root:
- At each node, compute cosine similarities between and children’s .
- If (depth-adaptive threshold), aggregate content via LLM-prompted summarization and proceed to .
- Otherwise, create new leaf for . The threshold (with , in practice).
Retrieval is performed by collapsed-tree scan: for query embedding , compute for all , discarding results below and returning top- nodes.
B. Prefix Tree State Operations
Insertion for state :
- At each level , follow or create edge labeled , progressing from to terminal (mark as terminal). Membership lookup and state extraction are likewise realized by deterministic tree traversal, yielding time per operation.
C. Eigen Memory Tree Routing and Learning
Routing for feature :
- At each internal node , compute projection ; route left if , else right.
- At leaf, select memory via scorer that minimizes . Insertion appends to the reached leaf; when capacity is exceeded, a PCA-based split is triggered using incremental Oja’s method.
3. Complexity, Space Usage, and Optimization
| MemTree Variant | Insertion/Lookup | Retrieval | Space | Key Optimizations |
|---|---|---|---|---|
| LLM MemTree (Rezazadeh et al., 2024) | insertion avg | flat scan | Hierarchical, depends on tree shape | Depth-adaptive , parallelizable aggregation |
| Prefix (Trie) (Taylor et al., 19 Dec 2025) | worst, often better with high prefix sharing | BMC-based variable order for compactness | ||
| EMT (Rucker et al., 2022) | Median splits, Oja’s PCA, learned scorer |
- In LLM MemTree, aggregation and embedding updates along the traversal path are parallelizable. Collapsed retrieval yields cost, but tree traversal can theoretically achieve .
- Prefix tree’s memory advantage arises when large state sets share long common prefixes; BMC-based variable ordering further tightens memory footprint by maximizing early sharing. Empirical savings are 45–70% vs. hash map for large biochemical CTMCs, with time per operation (Taylor et al., 19 Dec 2025).
- EMT’s binary tree yields access for both reads and writes; splits and router updates are amortized by leaf capacity.
4. Evaluation Metrics and Empirical Results
A. LLM MemTree (Rezazadeh et al., 2024)
Performance was evaluated on:
- Multi-Session Chat (MSC, 15 turns) and MSC-Extended (200 turns)
- QuALITY (5000-token QA, easy/hard distinctions)
- MultiHop RAG (609 news, 2556 multi-hop queries)
Key metrics: binary accuracy (by GPT-4 judge) and ROUGE-L recall.
Select outcomes:
- On MSC: MemoryStream 84.4% acc/79.1 R, MemTree 84.8% acc/79.9 R.
- On MSC-E: Full history 78.0%, MemoryStream 80.7%, MemTree 82.5%.
- On QuALITY: RAPTOR 59.0%, MemoryStream 43.8%, MemTree 59.8%.
- On MultiHop: RAPTOR 81.0%, MemoryStream 74.7%, MemTree 80.5% (best on temporal queries).
Insertion overhead for full-dataset on MultiHop is 10 s (MemTree) versus hr (RAPTOR/GraphRAG).
B. Prefix Trie for CTMCs (Taylor et al., 19 Dec 2025)
Empirical memory savings at scale (on state spaces up to ):
- Up to 68% less memory vs. hash maps.
- Per-operation CPU overhead is modest (seconds to minutes) given the scale.
C. EMT (Rucker et al., 2022)
On 206 OpenML contextual bandit datasets:
- EMT outperforms CMT on 177/206 datasets.
- EMT+Parametric hybrid (PEMT) beats pure parametric on 110 datasets, losing on only 8.
- For bounded memory (as little as ), PEMT maintains mean reward loss vs. unbounded.
5. Comparative Strengths and Domain Limitations
A. LLM MemTree
- Supports fully online updates at logarithmic cost, enabling incremental context management in extended dialogue/document scenarios.
- Hierarchical schema-like abstraction aligns with human topical structure, enabling high-level and granular retrieval.
- Outperforms flat memory approaches on long context and complex QA, approaching offline RAG systems for performance.
- Limitations: Relies on extra LLM summarization calls at insertion (3.27 per insertion on MultiHop), potential for retrieval of overly verbose or partially relevant passages, sensitivity to adaptive thresholding and summarization prompting.
B. Prefix Trie for CTMCs
- Substantially reduces memory for explicit state storage, especially with high concurrency and shared prefixes.
- Preprocessing (BMC) for variable order yields further savings but adds setup cost.
- Trade-off: deterministic access vs. hash average, but (number of variables) is moderate in practice.
C. EMT
- Efficient, self-consistent online memory, with provable access. Principal-component splits capture effective routing for many real-world tabular and bandit tasks.
- Hybridization with parametric models yields “no-downside” performance gains.
- Sensitivity to data regime: fixed routers may underperform in drifting distributions; Oja’s approximation requires well-behaved covariances; high-cardinality/sparse categorical features can challenge panel retrieval.
6. Cross-Domain Significance and Applicability
MemTree approaches unify a spectrum of requirements encountered in scalable memory augmentation:
- In LLMs, they enable structured, schema-aligned context for conversational agents, avoiding the redundancy and inefficiency of flat memory repacking (Rezazadeh et al., 2024).
- In model checking and systems biology, prefix trees make tractable the explicit storage and examination of massive discrete state spaces, previously bottlenecked by memory constraints of hash table methods (Taylor et al., 19 Dec 2025).
- In online sequential learning, tree-structured memory supports both efficient lookup and learning-based generalization, providing a viable alternative to -NN and streaming algorithms (Rucker et al., 2022).
The decisive empirical and theoretical properties across these domains are:
- Structurally induced efficiency (via prefix/redundancy sharing),
- Dynamic/online construction capability (no offline retraining),
- Alignment with underlying semantic or state-structure of the problem,
- Quantitative, domain-specific performance gains.
7. Open Problems and Future Directions
- Further optimization of tree construction—especially adaptive variable order in prefix tries and threshold/summarization in LLM MemTree—may yield enhanced performance in novel or high-dimensional domains (Rezazadeh et al., 2024, Taylor et al., 19 Dec 2025).
- Addressing the challenge of retrieval specificity versus verbosity in hierarchical memories remains pertinent, as does the integration of learned (differentiable) aggregation for abstracting node content.
- In online learning contexts, development of incremental eviction/rebalancing strategies for nonstationary data streams presents an open problem (Rucker et al., 2022).
- Broader generalization to hybrid symbolic-neural workflows and end-to-end differentiable controllers represents a plausible avenue for future extension.
MemTree thus constitutes a family of principled tree-based memory architectures, each providing domain-adaptive advances in memory scaling, abstraction, and retrieval compared to legacy methods, with substantial empirical validation across natural language, stochastic, and sequential learning contexts (Rezazadeh et al., 2024, Taylor et al., 19 Dec 2025, Rucker et al., 2022).