Hierarchical Aggregate Trees (HAT)

Updated 16 December 2025

Hierarchical Aggregate Trees (HAT) are data structures that recursively aggregate and summarize information across layered hierarchies, enabling efficient long-context memory management.
HATs employ recursive aggregation methods using tools like LLM summarization, neural networks for code ASTs, and variational inference in dynamical systems to consolidate multi-modal data.
Empirical evaluations show HATs improve performance in dialogue systems, code summarization, and dynamical modeling, while maintaining scalable storage and sublinear query complexity.

A Hierarchical Aggregate Tree (HAT) is a class of data structure employed for recursive aggregation, memory organization, and hierarchical sequence modeling. It encapsulates multiple forms across the domains of retrieval-augmented generation, structured code summarization, and dynamical systems, always applying the core principle of recursively aggregating or summarizing information across tree-structured hierarchies. HATs unify distinct modalities (text, embedding, dynamical states) under variants of aggregation—by LLM, neural network, or probabilistic model—to address long-context memory, efficient summarization, and structured multi-agent temporal modeling (A et al., 10 Jun 2024, Shi et al., 2021, Howard et al., 2012).

1. Formal Structure and Definitions

The general HAT is defined as a tuple

$\mathrm{HAT} = (L,\ M,\ A,\ \Sigma)$

where:

$L = \{\ell_0, \ell_1, ..., \ell_{D-1}\}$ is an ordered set of $D$ layers (with $\ell_0$ as the root),
$M \in \mathbb{N}^+$ is the memory length, specifying the branching factor (maximum children per parent),
$A: \mathcal{P}(\Sigma) \to \text{Text}$ or appropriate modality is the aggregation function,
$\Sigma$ is the set of all nodes (layered as $\sigma_{k,0}, \dots, \sigma_{k,N_k-1}$ for each $k$ ).

Each node $\sigma \in \Sigma$ holds:

A pointer to its parent and child set $C(\sigma)$ ,
Its data (“text” or embedding), defined recursively as either a raw datum (leaf) or $A(\{\text(\tau): \tau\in C(\sigma)\})$ (internal).

In code summarization, the structure adapts to Abstract Syntax Tree (AST) blocks, cutting at semantic boundaries to generate “super-nodes” and reconstruct hierarchical embedding aggregations (Shi et al., 2021). In dynamical systems, nodes represent Markov or Switching Linear Dynamical Systems chains, with aggregator chains combining the states of child chains (Howard et al., 2012).

2. Construction and Aggregation Mechanisms

Retrieval-Augmented Dialogue (Textual HAT)

Insertion proceeds as:

Insert new leaf node with text $u$ into bottom layer $\ell_{D-1}$ .
For index $i$ , assign parent in $\ell_{D-2}$ at index $j = \lfloor i/M \rfloor$ .
Update parent text via call to $A$ (typically an LLM summarization API), caching aggregated summaries keyed by the child-set hash.
Recursively propagate aggregation up the tree, ensuring logarithmic depth ( $D \approx \lceil \log_M(N) \rceil$ for $N$ utterances).

Code Summarization (CAST)

The AST is split into semantically coherent subtrees at block-level granularity (method, loop, control blocks). Bottom-up recursive neural network encodes each subtree:

$h_v = \tanh(W^c c_v + \frac{1}{|C(v)|}\sum_{u \in C(v)} W^a h_u)$

Dense block embeddings undergo aggregation via a second RvNN over the structure tree, forming global AST representations (Shi et al., 2021).

Dynamical Systems Trees (DST)

Nodes represent either SLDS or aggregator chains (HMMs). Each chain at time $t$ depends on its own previous state and the parent’s current state; leaves further depend on continuous latent variables and emissions. Aggregation occurs via conditional dependencies and sharing of state information at aggregators (Howard et al., 2012).

3. Selection, Traversal, and Inference

In retrieval-augmented settings, relevant context is selected as an optimal traversal path in the HAT, formalized as a Markov Decision Process:

$a_{0:T}^* = \arg\max_{a_{0:T}} R(s_{0:T}, a_{0:T}|q)$

Actions (Up, Down, Left, Right, Success, Output, Insufficient) are predicted by a GPT-based agent, which observes the query, current node, and history. Traversal continues until sufficient context coverage is obtained for the query (A et al., 10 Jun 2024).

In DSTs, structured mean-field variational inference is used for intractable marginalization:

$Q(S, X) = \prod_{v} Q^v(s_{0:T}^v, x_{0:T}^v)$

Chains are updated iteratively based on expectations under the variational factors of other chains, propagating messages between aggregators and children.

In hierarchical AST aggregation, information reconstitution and fusion with token-level embeddings are performed via serial multi-source cross-attention in a transformer decoder (Shi et al., 2021).

4. Computational Properties and Complexity

Build/Insertion Cost: Each addition modifies $O(D)$ nodes; each aggregation at a node with $M$ children costs $O(M \cdot C_A)$ , where $C_A$ is the aggregation cost. Total build cost is $O(N \log N)$ summary calls for $N$ utterances (A et al., 10 Jun 2024).
Query Complexity: Optimal traversal is $O(D \cdot C_{\text{model}}) \approx O(\log_M N)$ , representing sublinear retrieval compared to flat $O(N)$ similarity search (A et al., 10 Jun 2024).
Storage: Each parent stores $O(M)$ -aggregated summaries. For each of $D$ layers, $O(\lceil N / M^k \rceil)$ summaries are stored. Total memory is $O(N)$ , with logarithmic depth (A et al., 10 Jun 2024).
Code HAT: Hierarchical splitting and bounded-depth block aggregation reduce the per-batch sequence length and yield stable gradients, improving both training time (2× faster) and accuracy metrics versus flat or max-pooling (Shi et al., 2021).
DSTs: Variational inference for $A$ aggregator and $L$ leaves costs $O(A T S^2 + L (T S^2 + T d^3))$ per expectation-maximization sweep (Howard et al., 2012).

5. Comparative Empirical Results

Dialogue Memory (Retrieval-Augmented Generation)

On the multi-session-chat dataset:

BLEU-1/2 (session 5): GPTAgent HAT traversal yields 0.721/0.612, outperforming BFS (0.652/0.532), DFS (0.624/0.501), All-Context (0.612/0.492), Part-Context (0.592/0.473), and approaching Gold Memory (0.681/0.564).
DISTINCT-1/2: GPTAgent achieves 0.092/0.084, exceeding all baselines.
Summary Generation: Aggregate GPT summaries achieve BLEU-1/2 = 0.842/0.724, DISTINCT-1/2 = 0.102/0.094, F1 = 0.824 (A et al., 10 Jun 2024).

Code Summarization

BLEU Improvements: Hierarchical aggregation recovers 0.27–1.43 BLEU versus flat max-pool, depending on dataset.
Human Evaluation: “ACTOR” (CAST with HAT) is rated higher on informativeness (2.74), naturalness (3.08), and similarity to reference (2.66) than baseline models (Wilcoxon $p<0.01$ ) (Shi et al., 2021).

Dynamical Systems

Gene Expression Modeling: Clustered DST (4-state) achieves $\sim 10\%$ improvement in bound on test log-likelihood compared to flat or independent models.
Multi-Agent Sports Trajectories: Two-level DST reduces classification errors and yields the highest likelihood bounds compared to flat SLDS models (Howard et al., 2012).

6. Applications and Limitations

Applications

Long-context retrieval-augmented systems for dialogue and QA,
Multi-section and multi-document summarization,
Codebase search and function aggregation,
Multi-modal memory (text, image, video),
Complex group activity transcription (trajectory modeling, gene expression) (A et al., 10 Jun 2024, Shi et al., 2021, Howard et al., 2012).

Limitations

LLM-based aggregation introduces latency (seconds per traversal),
Unmanaged leaf growth increases storage; pruning strategies or hybridization with dense vector indexes are proposed,
Traversal and summary fidelity depend on aggregator function design; learning aggregation end-to-end (e.g., with BART/PEGASUS) is an open direction,
Statistical characterization of recall versus summary depth fidelity is unresolved (A et al., 10 Jun 2024).

7. Theoretical and Practical Considerations

HAT achieves a non-exponential parameter footprint due to recursive aggregation: storage is bounded as

$N + \lceil N/M \rceil + \lceil N/M^2 \rceil + \cdots + 1 = O(N)$

avoiding blow-up as context or dataset grows. Depth grows logarithmically with the number of items, maintaining tractable traversal and update costs.

Traversals can potentially be accelerated by heuristic or Monte Carlo Tree Search techniques. Hybrid approaches combining hierarchical and vectorized retrieval—or end-to-end-differentiable aggregation—remain active areas of research (A et al., 10 Jun 2024).

In sum, Hierarchical Aggregate Trees provide a scalable, expressive, and theoretically grounded memory and modeling paradigm for high-resolution, long-context, and structured aggregation tasks across natural language, code, and dynamical systems domains (A et al., 10 Jun 2024, Shi et al., 2021, Howard et al., 2012).

PDF Markdown Chat (Pro)

References (3)

Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation (2024)

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees (2021)

Dynamical Systems Trees (2012)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Hierarchical Aggregate Trees (HAT).