Tree-Structured Hierarchical Memory (TSHM)

Updated 4 July 2026

TSHM is a memory architecture that organizes data in a tree where leaves capture detailed observations and internal nodes hold abstract summaries.
It employs online insertion, depth-adaptive thresholds, and temporal consolidation to balance fine-grained evidence with high-level abstraction.
TSHM enhances retrieval accuracy and efficiency in long-context settings by enabling both collapsed and hierarchical search strategies.

Searching arXiv for recent and foundational work on tree-structured hierarchical memory across LLMs and related architectures. Tree-Structured Hierarchical Memory (TSHM) denotes a family of memory architectures in which memory units are arranged in a rooted hierarchy and used at multiple levels of abstraction. In the most direct instantiations, leaves retain specific observations, dialogue turns, facts, actions, or clips, while internal nodes store increasingly abstract summaries or representatives of their descendants; retrieval may then operate over coarse summaries, fine-grained leaves, or both under an explicit budget. A formal treatment of this pattern appears in work that models memory as a tree such as $T=(V,E)$ with node-local content, embeddings, and parent-child relations, while a later general theory factors hierarchical memory into extraction $\alpha$ , coarsening $C=(\pi,\rho)$ , and traversal $\tau$ (Rezazadeh et al., 2024, Talebirad et al., 23 Mar 2026).

1. Representational form and memory semantics

A direct operational definition of TSHM is given by MemTree, which represents memory as a tree $T=(V,E)$ and defines each node as

$v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$

where $c_v$ is textual content, $e_v\in\mathbb{R}^d$ is a semantic embedding, $p_v$ is the parent, $\mathcal{C}_v$ is the child set, and $\alpha$ 0 is depth from the root. In this formulation the root is purely structural, with $\alpha$ 1 and $\alpha$ 2, and depth is semantically meaningful: shallow nodes are broader and more abstract, while deeper nodes are more specific (Rezazadeh et al., 2024).

Other TSHM formulations preserve the same parent-child abstraction pattern but attach different invariants to the hierarchy. TiMem defines a Temporal Memory Tree as

$\alpha$ 3

where $\alpha$ 4 is partitioned into abstraction levels, $\alpha$ 5 links adjacent levels, $\alpha$ 6 assigns each node a continuous temporal interval, and $\alpha$ 7 stores semantic memory as text and embeddings. Its five-level instantiation—segment, session, day, week, profile—makes temporal containment the organizing principle rather than semantic clustering alone (Li et al., 6 Jan 2026). SegTreeMem likewise defines memory nodes over contiguous intervals $\alpha$ 8, with every internal node covering the union of ordered child intervals, so that the left-to-right order of leaves coincides with utterance chronology (Liu et al., 3 Jun 2026).

These systems indicate that TSHM is not a single fixed schema. Some instances are primarily semantic; others are temporal, task-centric, or hybrid. What remains stable is the use of internal nodes as explicit memory-bearing abstractions rather than empty routing metadata. In the theoretical language of hierarchical memory, leaves correspond to the output of extraction $\alpha$ 9, internal nodes arise from repeated coarsening $C=(\pi,\rho)$ 0, and the representative function $C=(\pi,\rho)$ 1 determines whether internal nodes behave more like self-sufficient summaries or referential routing labels (Talebirad et al., 23 Mar 2026).

2. Construction, consolidation, and online update

The principal design question in TSHM is how new information becomes part of the hierarchy. MemTree provides a fully online answer. Given new content $C=(\pi,\rho)$ 2, it computes

$C=(\pi,\rho)$ 3

and traverses from the root by cosine similarity against current children. If the best child similarity satisfies a depth-adaptive threshold

$C=(\pi,\rho)$ 4

or equivalently the appendix implementation

$C=(\pi,\rho)$ 5

then the system descends; otherwise it creates a new child. If traversal reaches a leaf that must be expanded, the leaf is converted into an internal node and its original content is preserved in a child. Parent text is updated by an external generative aggregation step rather than a closed-form pooling rule, and the embedding is recomputed from the new summary text (Rezazadeh et al., 2024).

Temporal systems replace semantic routing with schedule-driven consolidation. In TiMem, level $C=(\pi,\rho)$ 6 segment memories are generated online after each user-assistant exchange with $C=(\pi,\rho)$ 7, whereas levels $C=(\pi,\rho)$ 8– $C=(\pi,\rho)$ 9 are generated automatically at session, daily, weekly, and monthly boundaries. Parent content is produced by

$\tau$ 0

with level-specific prompts controlling whether the result is factual, pattern-oriented, or persona-level (Li et al., 6 Jan 2026). SegTreeMem instead uses an online rightmost-frontier rule. After $\tau$ 1 utterances, only nodes whose spans end at $\tau$ 2 are admissible insertion points for $\tau$ 3, so the update is local to the active frontier rather than global. New internal annotations are then recomputed bottom-up over the affected chain (Liu et al., 3 Jun 2026).

Other systems expose additional consolidation mechanisms. SHIMI imposes a maximum branching factor $\tau$ 4; if a parent exceeds this width, the two most similar children are replaced by a newly synthesized abstraction node

$\tau$ 5

so the tree restructures itself to maintain bounded branching and semantic compression (Helmi, 8 Apr 2025). H-Mem uses fixed temporal windows—day, week, month, year—and per-level similarity thresholds $\tau$ 6 to merge semantically similar events within the same window into progressively more abstract long-term summaries (Yu et al., 15 May 2026).

Across these systems, two update regimes recur. One is insertion-driven and local, as in MemTree, SegTreeMem, SHIMI, and MemForest. The other is schedule-driven and consolidation-oriented, as in TiMem and H-Mem. This suggests that TSHM is best understood not merely as a storage layout but as a policy for deciding when detail should remain local and when it should be rewritten into a higher-level memory state.

3. Retrieval, traversal, and coarsening-traversal coupling

Retrieval is the second defining operation of TSHM, and the literature shows that “tree-structured” does not imply a single retrieval pattern. The general theory defines traversal as

$\tau$ 7

under a token budget

$\tau$ 8

and argues that traversal strategy is constrained by the representative function $\tau$ 9: self-sufficient representatives support collapsed or hybrid search, whereas low-self-sufficiency representatives require top-down refinement (Talebirad et al., 23 Mar 2026).

MemTree is the clearest counterexample to the assumption that a memory tree should be navigated strictly along edges. Its default query procedure is “collapsed tree retrieval”: embed the query, compare it against all node embeddings, filter by a retrieval threshold, and return the top- $T=(V,E)$ 0 nodes. The tree is therefore used heavily for organization and abstraction at write time, but less directly for query-time routing. An ablation comparing collapsed retrieval with traversal retrieval reports that collapsed retrieval generally performs better unless traversal uses a relatively large $T=(V,E)$ 1, because early top-down decisions can miss relevant deep nodes and over-retrieve redundant parent summaries (Rezazadeh et al., 2024).

Other TSHM systems use more explicit hierarchical retrieval. TiMem first classifies a query as simple, hybrid, or complex, then activates $T=(V,E)$ 2 leaves with a fused semantic-lexical score and propagates upward to ancestors permitted by the planner-selected scope. Candidate memories are then filtered by a recall-gating step

$T=(V,E)$ 3

so retrieval composes fine-grained evidence with higher-level temporal abstractions (Li et al., 6 Jan 2026). SegTreeMem defines local node relevance $T=(V,E)$ 4, normalizes it to $T=(V,E)$ 5, and propagates relevance through the tree by

$T=(V,E)$ 6

with final score

$T=(V,E)$ 7

This makes hierarchical retrieval a finite-horizon propagation problem rather than a beam descent or flat ranking (Liu et al., 3 Jun 2026).

Streaming-video TSHM uses yet another retrieval mode. In LiveStarPro, long-term memory is organized as a Recursive Event Tree of evicted semantic clips, and retrieval performs hierarchical beam descent: $T=(V,E)$ 8 Matched nodes are then expanded to local event-chain context

$T=(V,E)$ 9

so the model retrieves not only a relevant event but also its local historical continuation (Yang et al., 16 Jun 2026).

Taken together, these systems support a strong generalization: TSHM retrieval is governed less by the presence of edges than by the information content of internal nodes. If internal nodes are reliable summaries, collapsed or all-level retrieval can be effective. If they are primarily routing artifacts, top-down refinement becomes necessary. This is the central coarsening-traversal coupling identified in the general theory (Talebirad et al., 23 Mar 2026).

4. Major architectural variants and application domains

TSHM appears in both learned internal-memory architectures and external memory managers. Early neural formulations implemented tree-structured memory inside recurrent or attentive models. Hierarchical Attentive Memory (HAM) organizes memory as a full binary tree whose leaves are memory cells and whose internal nodes store learned summaries $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 0; access is a hard top-down traversal with $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 1 cost per memory access in the main efficient variant (Andrychowicz et al., 2016). S-LSTM and Tree-structured ConvLSTM move the same principle into recurrent composition: parent memory is a gated merge of child memories, such as

$v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 2

in S-LSTM and

$v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 3

in Tree-structured ConvLSTM, making node-local state itself the hierarchical memory (Zhu et al., 2015, Kong et al., 2019). Tree-structured Attention with Hierarchical Accumulation is not presented as a memory model, but it constructs explicit internal-node phrase states from descendants and lets attention read from them, which is a close tree-memory interpretation inside a Transformer (Nguyen et al., 2020).

Recent LLM systems externalize TSHM and specialize it to task domain. MemTree and HAT target multi-session dialogue; TiMem and SegTreeMem make temporal hierarchy the primary organizing principle; MemForest turns memory into a forest of balanced temporal trees over session, entity, and scene scopes; H-Mem builds a temporal-semantic tree and augments it with a knowledge graph; TME uses a Task Memory Tree whose nodes explicitly track action, input/output, status, and dependencies; HMT for web agents uses a fixed three-level Intent/Stage/Action hierarchy to separate reusable workflow logic from website-specific grounding; SHIMI uses a semantic rooted tree and extends it with decentralized synchronization; LiveStarPro uses a short-term/long-term two-tier hierarchy for streaming video (Rezazadeh et al., 2024, A et al., 2024, Li et al., 6 Jan 2026, Liu et al., 3 Jun 2026, Chen et al., 16 May 2026, Yu et al., 15 May 2026, Ye, 11 Apr 2025, Tan et al., 7 Mar 2026, Helmi, 8 Apr 2025, Yang et al., 16 Jun 2026).

System	Organizing principle	Distinctive mechanism
MemTree	Semantic hierarchy	Online insertion with depth-adaptive threshold
TiMem	Temporal containment	Segment/session/day/week/profile consolidation
SegTreeMem	Contiguous temporal segments	Rightmost-frontier online update
MemForest	Time-ordered scoped trees	Dirty-path refresh in session/entity/scene trees
H-Mem	Temporal-semantic tree + graph	Long-term summaries plus entity relations
TME	Hierarchical task state	Active-node-path prompt synthesis
HMT	Intent/Stage/Action tree	Stage-aware Planner and grounded Actor
LiveStarPro TSHM	Event-chain archive	Recursive Event Tree for evicted video history

A recurring implication is that TSHM is best viewed as a design family rather than a single algorithm. Some variants are semantic and online; some are temporal and scheduled; some are summary-heavy; some couple the tree to a graph or planner. The shared invariant is the use of explicit multilevel representatives to mediate between raw history and bounded-context reasoning.

5. Empirical evidence, computational tradeoffs, and common misconceptions

The most direct evidence for TSHM in long-horizon language settings comes from MemTree, TiMem, SegTreeMem, MemForest, and H-Mem. MemTree reports that in the short-context Multi-Session Chat setting with only 15 rounds, direct full-history prompting is best, showing that hierarchical memory is not universally necessary when the active context is still small. In retrieval-only settings, however, MemTree slightly outperforms MemoryStream and MemGPT on MSC, and on the 200-round MSC-E benchmark it reaches $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 4 overall accuracy compared with $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 5 for MemoryStream and $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 6 for full-history prompting. On QuALITY it scores $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 7 against $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 8 for MemoryStream, and on the hard subset $v=[c_v,e_v,p_v,\mathcal{C}_v,d_v],$ 9 against $c_v$ 0. On MultiHop RAG it reaches $c_v$ 1 overall, with especially large gains on comparison and temporal questions, including $c_v$ 2 versus $c_v$ 3 on comparison and $c_v$ 4 versus $c_v$ 5 on temporal reasoning (Rezazadeh et al., 2024).

TiMem shows a related but more explicitly temporal effect. Under a consistent evaluation setup it reaches $c_v$ 6 on LoCoMo and $c_v$ 7 on LongMemEval-S, while reducing recalled memory length by $c_v$ 8 on LoCoMo. Its hierarchy ablation is especially diagnostic: $c_v$ 9 only with flat recall gives $e_v\in\mathbb{R}^d$ 0 on LoCoMo and $e_v\in\mathbb{R}^d$ 1 on LongMemEval-S; $e_v\in\mathbb{R}^d$ 2 only with hierarchical recall rises to $e_v\in\mathbb{R}^d$ 3 and $e_v\in\mathbb{R}^d$ 4; $e_v\in\mathbb{R}^d$ 5– $e_v\in\mathbb{R}^d$ 6 only drops to $e_v\in\mathbb{R}^d$ 7 and $e_v\in\mathbb{R}^d$ 8; and the full $e_v\in\mathbb{R}^d$ 9– $p_v$ 0 hierarchy yields $p_v$ 1 and $p_v$ 2. This directly shows that summary layers help, but cannot replace fine-grained evidence (Li et al., 6 Jan 2026).

SegTreeMem strengthens the case that the type of hierarchy matters. It improves LLM-judge accuracy over flat retrieval, graph-structured memory, and tree-structured memory baselines across LoCoMo, LongMemEval-MAB, and RealMem. Its temporal-order permutation analysis then shows that when 30% of turn pairs are swapped before memory construction, performance drops much more for SegTreeMem than for a non-temporal clustering tree: on LoCoMo, $p_v$ 3 for SegTreeMem-BU versus $p_v$ 4 for the non-temporal tree; on RealMem, $p_v$ 5 versus $p_v$ 6. This indicates that the gain depends on preserving chronological structure during construction (Liu et al., 3 Jun 2026).

In systems-oriented work, MemForest shows that TSHM can improve both quality and write-path scalability. On LongMemEval-S it reaches $p_v$ 7 pass@1 with Qwen3-30B while reducing post-extraction maintenance to an $p_v$ 8 dependent critical path for a balanced $p_v$ 9-ary MemTree, and its memory construction throughput is approximately $\mathcal{C}_v$ 0 higher than EverMemOS. The price is a larger write-time token budget, since the framework replaces a few long state-dependent prompts with many short parallel calls (Chen et al., 16 May 2026). H-Mem reports state-of-the-art QA performance on LoCoMo, LongMemEvalS, and REALTALK, and its ablation shows that removing the tree causes the largest performance drop among major components, larger than removing the graph or long-term memory summaries (Yu et al., 15 May 2026).

Video streaming provides an especially sharp retrieval comparison. LiveStarPro reports a $\mathcal{C}_v$ 1 improvement in semantic correctness and an $\mathcal{C}_v$ 2 reduction in timing error over prior online methods. In a long-term retrieval diagnostic, no long-term memory gives Recall(S) $\mathcal{C}_v$ 3 and Recall(L) $\mathcal{C}_v$ 4; a flat $\mathcal{C}_v$ 5-NN bank gives $\mathcal{C}_v$ 6, $\mathcal{C}_v$ 7, and $\mathcal{C}_v$ 8 ms latency; the Recursive Event Tree gives $\mathcal{C}_v$ 9, $\alpha$ 00, and $\alpha$ 01 ms. That pattern directly supports the claim that tree organization can improve both long-range recall and retrieval latency when the tree captures coherent event chains (Yang et al., 16 Jun 2026).

Several common misconceptions are contradicted by these results. First, TSHM is not uniformly superior to short-context baselines; MemTree explicitly reports that full-history prompting is best on MSC with only 15 rounds (Rezazadeh et al., 2024). Second, a memory being tree-structured does not imply that strict tree traversal is the best query procedure; MemTree’s collapsed search often beats traversal retrieval (Rezazadeh et al., 2024). Third, high-level summaries alone are insufficient; TiMem’s $\alpha$ 02– $\alpha$ 03 only ablation underperforms the full hierarchy (Li et al., 6 Jan 2026). Fourth, hierarchy alone is not the main effect in some settings: SegTreeMem’s permutation study indicates that temporal order itself is a decisive structural signal (Liu et al., 3 Jun 2026).

6. Limitations, theoretical synthesis, and open problems

Despite strong empirical results, several limitations recur across the literature. Many contemporary TSHM systems are not end-to-end trained memory modules. MemTree, TiMem, SHIMI, HAT, and H-Mem all rely on external or off-the-shelf LLMs for summarization, planning, or relation judgment rather than learning retrieval losses or differentiable routing objectives (Rezazadeh et al., 2024, Li et al., 6 Jan 2026, Helmi, 8 Apr 2025, A et al., 2024, Yu et al., 15 May 2026). This makes them modular and often practical, but it also leaves update rules heuristic and summary quality model-dependent.

A second limitation is topology. Several systems acknowledge that strict trees are only an approximation to real relational structure. TME is explicitly graph-aware and proposes DAG-backed memory as future work; HMT notes that stage nodes could semantically be shared across intents even though the implemented structure is a rooted tree; SHIMI states that its current implementation assumes a strictly tree-based semantic structure and treats polyhierarchy as a limitation (Ye, 11 Apr 2025, Tan et al., 7 Mar 2026, Helmi, 8 Apr 2025). A plausible implication is that future TSHM research will increasingly focus on controlled graph extensions rather than purely tree-shaped storage.

A third limitation concerns worst-case structure quality. SegTreeMem is shallow in practice, but its appendix shows that maximally topic-switching input can cause a degenerate cascading tree with height $\alpha$ 04 and total nodes $\alpha$ 05 (Liu et al., 3 Jun 2026). More generally, the theory paper emphasizes that representative quality constrains viable traversal. It defines self-sufficiency as

$\alpha$ 06

and proves that low-information representatives cannot reliably route among too many children, formalized by a Fano-style bound on branching. It also derives a coarsening-traversal coupling: self-sufficient representatives support collapsed search, while referential representatives require refinement-based traversal (Talebirad et al., 23 Mar 2026). This formalism clarifies why internal-node design, grouping coherence, branching factor, and retrieval policy cannot be tuned independently.

A related, older line of work studies hierarchical memory from the standpoint of non-uniform access cost rather than semantic abstraction. In the Hierarchical Memory Model for optimum binary search trees, structural optimality and placement optimality are inseparable because frequently accessed nodes should occupy cheaper memory levels, and subtree dynamic programs must reason jointly about tree shape and storage budget profiles (0804.0940). This suggests, though the setting is different, that future TSHM systems may benefit from treating memory placement, cache residency, and hierarchical abstraction as a single optimization problem rather than separate engineering layers.

The broad synthesis is therefore twofold. On one hand, TSHM has emerged as a practical design pattern for LLMs, agents, retrieval systems, and streaming multimodal models: leaves preserve detail, internal nodes encode abstractions, and retrieval exploits multiple granularities under a budget. On the other hand, the theory and ablations indicate that tree shape alone is not the essence. What determines success is the interaction among extraction granularity, grouping coherence, representative quality, update dynamics, and traversal policy. In that sense, TSHM is less a single memory structure than a disciplined way of organizing compression, retention, and retrieval over long histories (Talebirad et al., 23 Mar 2026).