Papers
Topics
Authors
Recent
Search
2000 character limit reached

MemTree: Hierarchical Memory Structures

Updated 23 March 2026
  • MemTree is a family of hierarchical memory architectures that structure data in tree forms to enable efficient storage, retrieval, and abstraction.
  • It includes implementations such as dynamic trees for LLMs, prefix tries for CTMC state storage, and Eigen Memory Trees for online sequential learning.
  • Empirical evaluations show MemTree improves scalability, reduces memory overhead, and speeds up retrieval compared to traditional flat or hash-based methods.

MemTree encompasses a set of algorithmic paradigms that use hierarchical or tree-structured memory representations to enable efficient storage, retrieval, and abstraction in settings ranging from LLMs to stochastic system exploration and online sequential learning. The MemTree concept has emerged independently across multiple research domains, each leveraging tree architectures to provide scalability, structured abstraction, or efficient access compared to traditional flat or hash-based data structures.

1. Formal Definitions and Core Data Structures

Three primary MemTree instantiations are established in the literature:

A. Hierarchical Dynamic Tree Memory for LLMs

MemTree (Rezazadeh et al., 2024) is defined as a rooted, directed tree T=(V,E)T=(V,E) where each node vVv\in V holds a tuple [cv,ev,pv,Cv,dv][c_v,\,e_v,\,p_v,\,\mathcal{C}_v,\,d_v]:

  • cvc_v: aggregated textual content (string)
  • evRde_v\in\mathbb{R}^d: semantic embedding of cvc_v
  • pvp_v: parent node (root has pv0=nullp_{v_0}=\mathrm{null})
  • Cv\mathcal{C}_v: set of child nodes
  • dvNd_v\in\mathbb{N}: tree depth

Hierarchical abstraction ties depth dvd_v to specificity: shallow nodes summarize coarse topics; deeper nodes encode fine-grained detail. The root is purely structural, holding no content or embedding.

B. Prefix Tree for State Storage in CTMCs

In explicit state storage for large-scale continuous-time Markov chains (CTMCs), MemTree is a prefix tree or trie (Taylor et al., 19 Dec 2025): T=(V,E,root,,Terminal)\mathcal{T}=(V, E, \mathrm{root}, \ell, \mathrm{Terminal}) with:

  • VV: set of nodes
  • root\mathrm{root}: depth $0$
  • EV×VE\subseteq V\times V: edges with labels (uv)=cN0\ell(u\rightarrow v)=c\in\mathbb{N}_0 (state variable values)
  • depth(v)=i\mathrm{depth}(v)=i: node corresponds to partial assignment [x1=c1,,xi=ci][x_1=c_1, \ldots, x_i=c_i]
  • TerminalV\mathrm{Terminal}\subseteq V: nodes representing complete states

Each node’s children are indexed by the next state variable’s value; traversal from root to terminal uniquely identifies a state.

C. Eigen Memory Tree (EMT) for Online Learning

EMT (Rucker et al., 2022) is a full binary tree where:

  • Internal nodes nn store a router (approximate principal component uRdu\in\mathbb{R}^d) and a split threshold (median projection value).
  • Leaves nn store a memory buffer M={(xi,yi)}M=\{(x_i, y_i)\} up to capacity cc.
  • A global scorer wRdw\in\mathbb{R}^d learns a parametric dissimilarity for retrieval.

Traversal decisions at internal nodes are based on projections u,x\langle u, x\rangle and tree structure is dynamically constructed online.

2. Insertion, Update, and Retrieval Algorithms

A. Dynamic Memory Update in LLM MemTree

Insertion of new content cnewc_{new} involves:

  1. Embedding computation: enewfemb(cnew)e_{new} \leftarrow f_{emb}(c_{new}).
  2. Recursive “InsertNode” starting at the root:
    • At each node, compute cosine similarities sis_i between enewe_{new} and children’s eie_i.
    • If smaxθ(d)s_{max}\geq\theta(d) (depth-adaptive threshold), aggregate content via LLM-prompted summarization and proceed to vbestv_{best}.
    • Otherwise, create new leaf for cnewc_{new}. The threshold θ(d)=θ0exp(λd)\theta(d)=\theta_0\exp(\lambda d) (with θ0=0.4\theta_0=0.4, λ=0.5\lambda=0.5 in practice).

Retrieval is performed by collapsed-tree scan: for query embedding eqe_q, compute sim(eq,ev)\operatorname{sim}(e_q,e_v) for all vVv\in V, discarding results below θretrieve\theta_{retrieve} and returning top-kk nodes.

B. Prefix Tree State Operations

Insertion for state s=[s1,,sd]s=[s_1,\ldots,s_d]:

  • At each level ii, follow or create edge labeled sis_i, progressing from root\mathrm{root} to terminal uu (mark as terminal). Membership lookup and state extraction are likewise realized by deterministic tree traversal, yielding O(d)O(d) time per operation.

C. Eigen Memory Tree Routing and Learning

Routing for feature xx:

  • At each internal node nn, compute projection v=n.router,xv=\langle n.\mathrm{router},x\rangle; route left if vn.boundaryv\leq n.\mathrm{boundary}, else right.
  • At leaf, select memory via scorer ww that minimizes sw(x,x)=max(0,w,xx)s_w(x, x')=\max(0,\langle w,|x-x'|\rangle). Insertion appends (x,y)(x,y) to the reached leaf; when capacity cc is exceeded, a PCA-based split is triggered using incremental Oja’s method.

3. Complexity, Space Usage, and Optimization

MemTree Variant Insertion/Lookup Retrieval Space Key Optimizations
LLM MemTree (Rezazadeh et al., 2024) O(logN)O(\log N) insertion avg O(N)O(N) flat scan Hierarchical, depends on tree shape Depth-adaptive θ\theta, parallelizable aggregation
Prefix (Trie) (Taylor et al., 19 Dec 2025) O(d)O(d) O(d)O(d) O(Nd)O(Nd) worst, often better with high prefix sharing BMC-based variable order for compactness
EMT (Rucker et al., 2022) O(logN+c)O(\log N + c) O(logN+c)O(\log N + c) O(Nd)O(Nd) Median splits, Oja’s PCA, learned scorer
  • In LLM MemTree, aggregation and embedding updates along the traversal path are parallelizable. Collapsed retrieval yields O(Vd)O(|V|\cdot d) cost, but tree traversal can theoretically achieve O(logN)O(\log N).
  • Prefix tree’s memory advantage arises when large state sets share long common prefixes; BMC-based variable ordering further tightens memory footprint by maximizing early sharing. Empirical savings are 45–70% vs. hash map for large biochemical CTMCs, with O(d)O(d) time per operation (Taylor et al., 19 Dec 2025).
  • EMT’s binary tree yields O(logN)O(\log N) access for both reads and writes; splits and router updates are amortized by leaf capacity.

4. Evaluation Metrics and Empirical Results

A. LLM MemTree (Rezazadeh et al., 2024)

Performance was evaluated on:

  • Multi-Session Chat (MSC, \sim15 turns) and MSC-Extended (200 turns)
  • QuALITY (5000-token QA, easy/hard distinctions)
  • MultiHop RAG (609 news, 2556 multi-hop queries)

Key metrics: binary accuracy (by GPT-4 judge) and ROUGE-L recall.

Select outcomes:

  • On MSC: MemoryStream 84.4% acc/79.1 R, MemTree 84.8% acc/79.9 R.
  • On MSC-E: Full history 78.0%, MemoryStream 80.7%, MemTree 82.5%.
  • On QuALITY: RAPTOR 59.0%, MemoryStream 43.8%, MemTree 59.8%.
  • On MultiHop: RAPTOR 81.0%, MemoryStream 74.7%, MemTree 80.5% (best on temporal queries).

Insertion overhead for full-dataset on MultiHop is \sim10 s (MemTree) versus >1>1 hr (RAPTOR/GraphRAG).

B. Prefix Trie for CTMCs (Taylor et al., 19 Dec 2025)

Empirical memory savings at scale (on state spaces up to 10810^8):

  • Up to 68% less memory vs. hash maps.
  • Per-operation CPU overhead is modest (seconds to minutes) given the scale.

C. EMT (Rucker et al., 2022)

On 206 OpenML contextual bandit datasets:

  • EMT outperforms CMT on 177/206 datasets.
  • EMT+Parametric hybrid (PEMT) beats pure parametric on 110 datasets, losing on only 8.
  • For bounded memory (as little as 1k1\text{k}), PEMT maintains <0.008<0.008 mean reward loss vs. unbounded.

5. Comparative Strengths and Domain Limitations

A. LLM MemTree

  • Supports fully online updates at logarithmic cost, enabling incremental context management in extended dialogue/document scenarios.
  • Hierarchical schema-like abstraction aligns with human topical structure, enabling high-level and granular retrieval.
  • Outperforms flat memory approaches on long context and complex QA, approaching offline RAG systems for performance.
  • Limitations: Relies on extra LLM summarization calls at insertion (\approx3.27 per insertion on MultiHop), potential for retrieval of overly verbose or partially relevant passages, sensitivity to adaptive thresholding and summarization prompting.

B. Prefix Trie for CTMCs

  • Substantially reduces memory for explicit state storage, especially with high concurrency and shared prefixes.
  • Preprocessing (BMC) for variable order yields further savings but adds setup cost.
  • Trade-off: O(d)O(d) deterministic access vs. O(1)O(1) hash average, but dd (number of variables) is moderate in practice.

C. EMT

  • Efficient, self-consistent online memory, with provable O(logN)O(\log N) access. Principal-component splits capture effective routing for many real-world tabular and bandit tasks.
  • Hybridization with parametric models yields “no-downside” performance gains.
  • Sensitivity to data regime: fixed routers may underperform in drifting distributions; Oja’s approximation requires well-behaved covariances; high-cardinality/sparse categorical features can challenge panel retrieval.

6. Cross-Domain Significance and Applicability

MemTree approaches unify a spectrum of requirements encountered in scalable memory augmentation:

  • In LLMs, they enable structured, schema-aligned context for conversational agents, avoiding the redundancy and inefficiency of flat memory repacking (Rezazadeh et al., 2024).
  • In model checking and systems biology, prefix trees make tractable the explicit storage and examination of massive discrete state spaces, previously bottlenecked by memory constraints of hash table methods (Taylor et al., 19 Dec 2025).
  • In online sequential learning, tree-structured memory supports both efficient lookup and learning-based generalization, providing a viable alternative to kk-NN and streaming algorithms (Rucker et al., 2022).

The decisive empirical and theoretical properties across these domains are:

  • Structurally induced efficiency (via prefix/redundancy sharing),
  • Dynamic/online construction capability (no offline retraining),
  • Alignment with underlying semantic or state-structure of the problem,
  • Quantitative, domain-specific performance gains.

7. Open Problems and Future Directions

  • Further optimization of tree construction—especially adaptive variable order in prefix tries and threshold/summarization in LLM MemTree—may yield enhanced performance in novel or high-dimensional domains (Rezazadeh et al., 2024, Taylor et al., 19 Dec 2025).
  • Addressing the challenge of retrieval specificity versus verbosity in hierarchical memories remains pertinent, as does the integration of learned (differentiable) aggregation for abstracting node content.
  • In online learning contexts, development of incremental eviction/rebalancing strategies for nonstationary data streams presents an open problem (Rucker et al., 2022).
  • Broader generalization to hybrid symbolic-neural workflows and end-to-end differentiable controllers represents a plausible avenue for future extension.

MemTree thus constitutes a family of principled tree-based memory architectures, each providing domain-adaptive advances in memory scaling, abstraction, and retrieval compared to legacy methods, with substantial empirical validation across natural language, stochastic, and sequential learning contexts (Rezazadeh et al., 2024, Taylor et al., 19 Dec 2025, Rucker et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MemTree.