Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tree-Structured Hierarchical Memory (TSHM)

Updated 4 July 2026
  • TSHM is a memory architecture that organizes data in a tree where leaves capture detailed observations and internal nodes hold abstract summaries.
  • It employs online insertion, depth-adaptive thresholds, and temporal consolidation to balance fine-grained evidence with high-level abstraction.
  • TSHM enhances retrieval accuracy and efficiency in long-context settings by enabling both collapsed and hierarchical search strategies.

Searching arXiv for recent and foundational work on tree-structured hierarchical memory across LLMs and related architectures. Tree-Structured Hierarchical Memory (TSHM) denotes a family of memory architectures in which memory units are arranged in a rooted hierarchy and used at multiple levels of abstraction. In the most direct instantiations, leaves retain specific observations, dialogue turns, facts, actions, or clips, while internal nodes store increasingly abstract summaries or representatives of their descendants; retrieval may then operate over coarse summaries, fine-grained leaves, or both under an explicit budget. A formal treatment of this pattern appears in work that models memory as a tree such as T=(V,E)T=(V,E) with node-local content, embeddings, and parent-child relations, while a later general theory factors hierarchical memory into extraction α\alpha, coarsening C=(π,ρ)C=(\pi,\rho), and traversal τ\tau (Rezazadeh et al., 2024, Talebirad et al., 23 Mar 2026).

1. Representational form and memory semantics

A direct operational definition of TSHM is given by MemTree, which represents memory as a tree T=(V,E)T=(V,E) and defines each node as

v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],

where cvc_v is textual content, evRde_v\in\mathbb{R}^d is a semantic embedding, pvp_v is the parent, Cv\mathcal{C}_v is the child set, and α\alpha0 is depth from the root. In this formulation the root is purely structural, with α\alpha1 and α\alpha2, and depth is semantically meaningful: shallow nodes are broader and more abstract, while deeper nodes are more specific (Rezazadeh et al., 2024).

Other TSHM formulations preserve the same parent-child abstraction pattern but attach different invariants to the hierarchy. TiMem defines a Temporal Memory Tree as

α\alpha3

where α\alpha4 is partitioned into abstraction levels, α\alpha5 links adjacent levels, α\alpha6 assigns each node a continuous temporal interval, and α\alpha7 stores semantic memory as text and embeddings. Its five-level instantiation—segment, session, day, week, profile—makes temporal containment the organizing principle rather than semantic clustering alone (Li et al., 6 Jan 2026). SegTreeMem likewise defines memory nodes over contiguous intervals α\alpha8, with every internal node covering the union of ordered child intervals, so that the left-to-right order of leaves coincides with utterance chronology (Liu et al., 3 Jun 2026).

These systems indicate that TSHM is not a single fixed schema. Some instances are primarily semantic; others are temporal, task-centric, or hybrid. What remains stable is the use of internal nodes as explicit memory-bearing abstractions rather than empty routing metadata. In the theoretical language of hierarchical memory, leaves correspond to the output of extraction α\alpha9, internal nodes arise from repeated coarsening C=(π,ρ)C=(\pi,\rho)0, and the representative function C=(π,ρ)C=(\pi,\rho)1 determines whether internal nodes behave more like self-sufficient summaries or referential routing labels (Talebirad et al., 23 Mar 2026).

2. Construction, consolidation, and online update

The principal design question in TSHM is how new information becomes part of the hierarchy. MemTree provides a fully online answer. Given new content C=(π,ρ)C=(\pi,\rho)2, it computes

C=(π,ρ)C=(\pi,\rho)3

and traverses from the root by cosine similarity against current children. If the best child similarity satisfies a depth-adaptive threshold

C=(π,ρ)C=(\pi,\rho)4

or equivalently the appendix implementation

C=(π,ρ)C=(\pi,\rho)5

then the system descends; otherwise it creates a new child. If traversal reaches a leaf that must be expanded, the leaf is converted into an internal node and its original content is preserved in a child. Parent text is updated by an external generative aggregation step rather than a closed-form pooling rule, and the embedding is recomputed from the new summary text (Rezazadeh et al., 2024).

Temporal systems replace semantic routing with schedule-driven consolidation. In TiMem, level C=(π,ρ)C=(\pi,\rho)6 segment memories are generated online after each user-assistant exchange with C=(π,ρ)C=(\pi,\rho)7, whereas levels C=(π,ρ)C=(\pi,\rho)8–C=(π,ρ)C=(\pi,\rho)9 are generated automatically at session, daily, weekly, and monthly boundaries. Parent content is produced by

τ\tau0

with level-specific prompts controlling whether the result is factual, pattern-oriented, or persona-level (Li et al., 6 Jan 2026). SegTreeMem instead uses an online rightmost-frontier rule. After τ\tau1 utterances, only nodes whose spans end at τ\tau2 are admissible insertion points for τ\tau3, so the update is local to the active frontier rather than global. New internal annotations are then recomputed bottom-up over the affected chain (Liu et al., 3 Jun 2026).

Other systems expose additional consolidation mechanisms. SHIMI imposes a maximum branching factor τ\tau4; if a parent exceeds this width, the two most similar children are replaced by a newly synthesized abstraction node

τ\tau5

so the tree restructures itself to maintain bounded branching and semantic compression (Helmi, 8 Apr 2025). H-Mem uses fixed temporal windows—day, week, month, year—and per-level similarity thresholds τ\tau6 to merge semantically similar events within the same window into progressively more abstract long-term summaries (Yu et al., 15 May 2026).

Across these systems, two update regimes recur. One is insertion-driven and local, as in MemTree, SegTreeMem, SHIMI, and MemForest. The other is schedule-driven and consolidation-oriented, as in TiMem and H-Mem. This suggests that TSHM is best understood not merely as a storage layout but as a policy for deciding when detail should remain local and when it should be rewritten into a higher-level memory state.

3. Retrieval, traversal, and coarsening-traversal coupling

Retrieval is the second defining operation of TSHM, and the literature shows that “tree-structured” does not imply a single retrieval pattern. The general theory defines traversal as

τ\tau7

under a token budget

τ\tau8

and argues that traversal strategy is constrained by the representative function τ\tau9: self-sufficient representatives support collapsed or hybrid search, whereas low-self-sufficiency representatives require top-down refinement (Talebirad et al., 23 Mar 2026).

MemTree is the clearest counterexample to the assumption that a memory tree should be navigated strictly along edges. Its default query procedure is “collapsed tree retrieval”: embed the query, compare it against all node embeddings, filter by a retrieval threshold, and return the top-T=(V,E)T=(V,E)0 nodes. The tree is therefore used heavily for organization and abstraction at write time, but less directly for query-time routing. An ablation comparing collapsed retrieval with traversal retrieval reports that collapsed retrieval generally performs better unless traversal uses a relatively large T=(V,E)T=(V,E)1, because early top-down decisions can miss relevant deep nodes and over-retrieve redundant parent summaries (Rezazadeh et al., 2024).

Other TSHM systems use more explicit hierarchical retrieval. TiMem first classifies a query as simple, hybrid, or complex, then activates T=(V,E)T=(V,E)2 leaves with a fused semantic-lexical score and propagates upward to ancestors permitted by the planner-selected scope. Candidate memories are then filtered by a recall-gating step

T=(V,E)T=(V,E)3

so retrieval composes fine-grained evidence with higher-level temporal abstractions (Li et al., 6 Jan 2026). SegTreeMem defines local node relevance T=(V,E)T=(V,E)4, normalizes it to T=(V,E)T=(V,E)5, and propagates relevance through the tree by

T=(V,E)T=(V,E)6

with final score

T=(V,E)T=(V,E)7

This makes hierarchical retrieval a finite-horizon propagation problem rather than a beam descent or flat ranking (Liu et al., 3 Jun 2026).

Streaming-video TSHM uses yet another retrieval mode. In LiveStarPro, long-term memory is organized as a Recursive Event Tree of evicted semantic clips, and retrieval performs hierarchical beam descent: T=(V,E)T=(V,E)8 Matched nodes are then expanded to local event-chain context

T=(V,E)T=(V,E)9

so the model retrieves not only a relevant event but also its local historical continuation (Yang et al., 16 Jun 2026).

Taken together, these systems support a strong generalization: TSHM retrieval is governed less by the presence of edges than by the information content of internal nodes. If internal nodes are reliable summaries, collapsed or all-level retrieval can be effective. If they are primarily routing artifacts, top-down refinement becomes necessary. This is the central coarsening-traversal coupling identified in the general theory (Talebirad et al., 23 Mar 2026).

4. Major architectural variants and application domains

TSHM appears in both learned internal-memory architectures and external memory managers. Early neural formulations implemented tree-structured memory inside recurrent or attentive models. Hierarchical Attentive Memory (HAM) organizes memory as a full binary tree whose leaves are memory cells and whose internal nodes store learned summaries v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],0; access is a hard top-down traversal with v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],1 cost per memory access in the main efficient variant (Andrychowicz et al., 2016). S-LSTM and Tree-structured ConvLSTM move the same principle into recurrent composition: parent memory is a gated merge of child memories, such as

v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],2

in S-LSTM and

v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],3

in Tree-structured ConvLSTM, making node-local state itself the hierarchical memory (Zhu et al., 2015, Kong et al., 2019). Tree-structured Attention with Hierarchical Accumulation is not presented as a memory model, but it constructs explicit internal-node phrase states from descendants and lets attention read from them, which is a close tree-memory interpretation inside a Transformer (Nguyen et al., 2020).

Recent LLM systems externalize TSHM and specialize it to task domain. MemTree and HAT target multi-session dialogue; TiMem and SegTreeMem make temporal hierarchy the primary organizing principle; MemForest turns memory into a forest of balanced temporal trees over session, entity, and scene scopes; H-Mem builds a temporal-semantic tree and augments it with a knowledge graph; TME uses a Task Memory Tree whose nodes explicitly track action, input/output, status, and dependencies; HMT for web agents uses a fixed three-level Intent/Stage/Action hierarchy to separate reusable workflow logic from website-specific grounding; SHIMI uses a semantic rooted tree and extends it with decentralized synchronization; LiveStarPro uses a short-term/long-term two-tier hierarchy for streaming video (Rezazadeh et al., 2024, A et al., 2024, Li et al., 6 Jan 2026, Liu et al., 3 Jun 2026, Chen et al., 16 May 2026, Yu et al., 15 May 2026, Ye, 11 Apr 2025, Tan et al., 7 Mar 2026, Helmi, 8 Apr 2025, Yang et al., 16 Jun 2026).

System Organizing principle Distinctive mechanism
MemTree Semantic hierarchy Online insertion with depth-adaptive threshold
TiMem Temporal containment Segment/session/day/week/profile consolidation
SegTreeMem Contiguous temporal segments Rightmost-frontier online update
MemForest Time-ordered scoped trees Dirty-path refresh in session/entity/scene trees
H-Mem Temporal-semantic tree + graph Long-term summaries plus entity relations
TME Hierarchical task state Active-node-path prompt synthesis
HMT Intent/Stage/Action tree Stage-aware Planner and grounded Actor
LiveStarPro TSHM Event-chain archive Recursive Event Tree for evicted video history

A recurring implication is that TSHM is best viewed as a design family rather than a single algorithm. Some variants are semantic and online; some are temporal and scheduled; some are summary-heavy; some couple the tree to a graph or planner. The shared invariant is the use of explicit multilevel representatives to mediate between raw history and bounded-context reasoning.

5. Empirical evidence, computational tradeoffs, and common misconceptions

The most direct evidence for TSHM in long-horizon language settings comes from MemTree, TiMem, SegTreeMem, MemForest, and H-Mem. MemTree reports that in the short-context Multi-Session Chat setting with only 15 rounds, direct full-history prompting is best, showing that hierarchical memory is not universally necessary when the active context is still small. In retrieval-only settings, however, MemTree slightly outperforms MemoryStream and MemGPT on MSC, and on the 200-round MSC-E benchmark it reaches v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],4 overall accuracy compared with v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],5 for MemoryStream and v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],6 for full-history prompting. On QuALITY it scores v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],7 against v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],8 for MemoryStream, and on the hard subset v=[cv,ev,pv,Cv,dv],v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],9 against cvc_v0. On MultiHop RAG it reaches cvc_v1 overall, with especially large gains on comparison and temporal questions, including cvc_v2 versus cvc_v3 on comparison and cvc_v4 versus cvc_v5 on temporal reasoning (Rezazadeh et al., 2024).

TiMem shows a related but more explicitly temporal effect. Under a consistent evaluation setup it reaches cvc_v6 on LoCoMo and cvc_v7 on LongMemEval-S, while reducing recalled memory length by cvc_v8 on LoCoMo. Its hierarchy ablation is especially diagnostic: cvc_v9 only with flat recall gives evRde_v\in\mathbb{R}^d0 on LoCoMo and evRde_v\in\mathbb{R}^d1 on LongMemEval-S; evRde_v\in\mathbb{R}^d2 only with hierarchical recall rises to evRde_v\in\mathbb{R}^d3 and evRde_v\in\mathbb{R}^d4; evRde_v\in\mathbb{R}^d5–evRde_v\in\mathbb{R}^d6 only drops to evRde_v\in\mathbb{R}^d7 and evRde_v\in\mathbb{R}^d8; and the full evRde_v\in\mathbb{R}^d9–pvp_v0 hierarchy yields pvp_v1 and pvp_v2. This directly shows that summary layers help, but cannot replace fine-grained evidence (Li et al., 6 Jan 2026).

SegTreeMem strengthens the case that the type of hierarchy matters. It improves LLM-judge accuracy over flat retrieval, graph-structured memory, and tree-structured memory baselines across LoCoMo, LongMemEval-MAB, and RealMem. Its temporal-order permutation analysis then shows that when 30% of turn pairs are swapped before memory construction, performance drops much more for SegTreeMem than for a non-temporal clustering tree: on LoCoMo, pvp_v3 for SegTreeMem-BU versus pvp_v4 for the non-temporal tree; on RealMem, pvp_v5 versus pvp_v6. This indicates that the gain depends on preserving chronological structure during construction (Liu et al., 3 Jun 2026).

In systems-oriented work, MemForest shows that TSHM can improve both quality and write-path scalability. On LongMemEval-S it reaches pvp_v7 pass@1 with Qwen3-30B while reducing post-extraction maintenance to an pvp_v8 dependent critical path for a balanced pvp_v9-ary MemTree, and its memory construction throughput is approximately Cv\mathcal{C}_v0 higher than EverMemOS. The price is a larger write-time token budget, since the framework replaces a few long state-dependent prompts with many short parallel calls (Chen et al., 16 May 2026). H-Mem reports state-of-the-art QA performance on LoCoMo, LongMemEvalS, and REALTALK, and its ablation shows that removing the tree causes the largest performance drop among major components, larger than removing the graph or long-term memory summaries (Yu et al., 15 May 2026).

Video streaming provides an especially sharp retrieval comparison. LiveStarPro reports a Cv\mathcal{C}_v1 improvement in semantic correctness and an Cv\mathcal{C}_v2 reduction in timing error over prior online methods. In a long-term retrieval diagnostic, no long-term memory gives Recall(S) Cv\mathcal{C}_v3 and Recall(L) Cv\mathcal{C}_v4; a flat Cv\mathcal{C}_v5-NN bank gives Cv\mathcal{C}_v6, Cv\mathcal{C}_v7, and Cv\mathcal{C}_v8 ms latency; the Recursive Event Tree gives Cv\mathcal{C}_v9, α\alpha00, and α\alpha01 ms. That pattern directly supports the claim that tree organization can improve both long-range recall and retrieval latency when the tree captures coherent event chains (Yang et al., 16 Jun 2026).

Several common misconceptions are contradicted by these results. First, TSHM is not uniformly superior to short-context baselines; MemTree explicitly reports that full-history prompting is best on MSC with only 15 rounds (Rezazadeh et al., 2024). Second, a memory being tree-structured does not imply that strict tree traversal is the best query procedure; MemTree’s collapsed search often beats traversal retrieval (Rezazadeh et al., 2024). Third, high-level summaries alone are insufficient; TiMem’s α\alpha02–α\alpha03 only ablation underperforms the full hierarchy (Li et al., 6 Jan 2026). Fourth, hierarchy alone is not the main effect in some settings: SegTreeMem’s permutation study indicates that temporal order itself is a decisive structural signal (Liu et al., 3 Jun 2026).

6. Limitations, theoretical synthesis, and open problems

Despite strong empirical results, several limitations recur across the literature. Many contemporary TSHM systems are not end-to-end trained memory modules. MemTree, TiMem, SHIMI, HAT, and H-Mem all rely on external or off-the-shelf LLMs for summarization, planning, or relation judgment rather than learning retrieval losses or differentiable routing objectives (Rezazadeh et al., 2024, Li et al., 6 Jan 2026, Helmi, 8 Apr 2025, A et al., 2024, Yu et al., 15 May 2026). This makes them modular and often practical, but it also leaves update rules heuristic and summary quality model-dependent.

A second limitation is topology. Several systems acknowledge that strict trees are only an approximation to real relational structure. TME is explicitly graph-aware and proposes DAG-backed memory as future work; HMT notes that stage nodes could semantically be shared across intents even though the implemented structure is a rooted tree; SHIMI states that its current implementation assumes a strictly tree-based semantic structure and treats polyhierarchy as a limitation (Ye, 11 Apr 2025, Tan et al., 7 Mar 2026, Helmi, 8 Apr 2025). A plausible implication is that future TSHM research will increasingly focus on controlled graph extensions rather than purely tree-shaped storage.

A third limitation concerns worst-case structure quality. SegTreeMem is shallow in practice, but its appendix shows that maximally topic-switching input can cause a degenerate cascading tree with height α\alpha04 and total nodes α\alpha05 (Liu et al., 3 Jun 2026). More generally, the theory paper emphasizes that representative quality constrains viable traversal. It defines self-sufficiency as

α\alpha06

and proves that low-information representatives cannot reliably route among too many children, formalized by a Fano-style bound on branching. It also derives a coarsening-traversal coupling: self-sufficient representatives support collapsed search, while referential representatives require refinement-based traversal (Talebirad et al., 23 Mar 2026). This formalism clarifies why internal-node design, grouping coherence, branching factor, and retrieval policy cannot be tuned independently.

A related, older line of work studies hierarchical memory from the standpoint of non-uniform access cost rather than semantic abstraction. In the Hierarchical Memory Model for optimum binary search trees, structural optimality and placement optimality are inseparable because frequently accessed nodes should occupy cheaper memory levels, and subtree dynamic programs must reason jointly about tree shape and storage budget profiles (0804.0940). This suggests, though the setting is different, that future TSHM systems may benefit from treating memory placement, cache residency, and hierarchical abstraction as a single optimization problem rather than separate engineering layers.

The broad synthesis is therefore twofold. On one hand, TSHM has emerged as a practical design pattern for LLMs, agents, retrieval systems, and streaming multimodal models: leaves preserve detail, internal nodes encode abstractions, and retrieval exploits multiple granularities under a budget. On the other hand, the theory and ablations indicate that tree shape alone is not the essence. What determines success is the interaction among extraction granularity, grouping coherence, representative quality, update dynamics, and traversal policy. In that sense, TSHM is less a single memory structure than a disciplined way of organizing compression, retention, and retrieval over long histories (Talebirad et al., 23 Mar 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tree-Structured Hierarchical Memory (TSHM).