Papers
Topics
Authors
Recent
Search
2000 character limit reached

TreeSeg: Hierarchical Segmentation & Interval Queries

Updated 2 March 2026
  • TreeSeg is a tree-based framework encompassing both hierarchical segmentation of noisy ASR transcripts and a dynamic BITS-Tree for efficient segment storage.
  • It employs recursive divisive clustering with windowed embeddings to partition transcripts into semantically coherent segments while addressing noise and variable segment counts.
  • The BITS-Tree component supports efficient point and range queries through logarithmic operations, making it effective for dynamic interval querying in large datasets.

TreeSeg refers to distinct concepts in the literature: (1) an algorithm for hierarchical topic segmentation of large transcripts, and (2) a dynamic data structure for efficient segment storage and interval queries (the BITS-Tree). Both rely on tree-based representations to organize, partition, or index sequential data. This article reviews both domains for completeness.

1. Hierarchical Topic Segmentation: TreeSeg Algorithm

TreeSeg is an approach for hierarchical, structure-preserving topic segmentation of long, noisy transcripts, notably those generated by Automatic Speech Recognition (ASR) systems. The core objective is to partition a temporally ordered sequence of utterances U=[U1,,UT]U = [U_1, \ldots, U_T] into KK contiguous, semantically coherent segments (P1,,PK)(P_1, \ldots, P_K), despite noise, uncertain ground-truth KK, and large TT (Gklezakos et al., 2024).

The method addresses three principal challenges: persistent ASR noise, variable and ambiguous segment counts, and computational efficiency for large transcripts. TreeSeg outputs a binary tree segmentation, allowing flexible control of granularity by tree cutting at arbitrary depth.

2. Mathematical Formulation and Algorithm

Each utterance UtU_t is embedded via a windowed approach using a pre-trained, frozen embedding model ff (e.g., ADA, SBERT, RoBERTa), computing et=f(Bt)Rde_t = f(B_t) \in \mathbb{R}^d for the block Bt=[Umax(1,tW),,Ut]B_t = [U_{\max(1, t-W)}, \ldots, U_t] with window width WW.

TreeSeg employs divisive clustering recursively, searching for split index ii within a segment Ev=[es,,ee]E_v = [e_s, \ldots, e_e] to minimize the within-cluster squared Euclidean distance:

Lv(i)=t=si1etμL22+t=ieetμR22\mathcal{L}_v(i) = \sum_{t=s}^{i-1} \|e_t - \mu_L\|^2_2 + \sum_{t=i}^{e} \|e_t - \mu_R\|^2_2

where

μL=1ist=si1et,μR=1ei+1t=ieet\mu_L = \frac{1}{i - s} \sum_{t=s}^{i-1} e_t, \quad \mu_R = \frac{1}{e - i + 1} \sum_{t=i}^{e} e_t

A minimum segment size MM is enforced for split validity (s+MieM+1s+M\leq i \leq e-M+1). The binary tree is constructed recursively: at each step, the “best” leaf to split and its best split point are selected (via a min-heap over candidate losses), and splitting proceeds until a user-specified number of leaves KK is reached, or segments become too small.

This recursive, batch-divisive procedure yields a hierarchical representation, allowing for efficient “zooming” into transcript structure at arbitrary resolutions.

3. Noise Robustness, Embedding Model Integration, and Computational Properties

By embedding blocks instead of isolated utterances and averaging in clustering, TreeSeg attenuates the effect of local ASR noise. The model ff is not fine-tuned; no additional dimensionality reduction or smoothing is applied except for windowing and averaging. This fosters portability and reproducibility.

The complexity is as follows: for TT utterances, dd-dimensional embeddings, and final SS segments,

  • Embedding: O(Td)O(Td)
  • Precomputation (cumulative sums): O(Td)O(Td)
  • All splitting: O(TlogS)O(T\log S) (expected, from geometric shrinkage)
  • Heap maintenance: O(SlogS)O(S\log S) Overall, O(Td+TlogS+SlogS)O(Td + T\log S + S\log S), which is near-linear in TT for practical STS \ll T.

Memory usage is O(Td)O(Td) for embeddings, O(T)O(T) for accumulated statistics, and O(S)O(S) for tree nodes and heap.

4. Empirical Evaluation and Results

TreeSeg was evaluated on three datasets:

  • ICSI: 75 meetings with up to 4-level segmentation, mean 1454 utterances
  • AMI: >100 hours, up to 3 levels, mean 636 utterances
  • TinyRec: 21 technical sessions, 2-level segmentation, mean 267 utterances

Comparisons included RandomSeg, EquiSeg, HyperSeg (TextTiling-style, hyperdimensional embeddings), and BertSeg (TextTiling-style, BERT blocks). Performance was assessed using PkP_k and WindowDiff metrics (lower is better).

Aggregate multi-level results: | Corpus | Metric | TreeSeg | Next Best (BertSeg) | |---------|---------------|---------|---------------------| | ICSI | PkP_k | 0.310 | 0.388 | | | WinDiff | 0.353 | 0.432 | | AMI | PkP_k | 0.355 | 0.443 | | | WinDiff | 0.396 | 0.480 | | TinyRec | PkP_k | 0.367 | 0.473 | | | WinDiff | 0.382 | 0.486 |

Per-level scores also favor TreeSeg, e.g., on ICSI level 1: Pk=0.28P_k=0.28 (TreeSeg) vs $0.343$ (BertSeg), WinDiff $0.314$ vs $0.386$. TreeSeg outperforms all baselines across datasets and at multiple resolutions (Gklezakos et al., 2024).

5. Limitations and Future Directions in Transcript Segmentation

Current validation is limited to structured meeting corpora (ICSI, AMI); diversity and scale in transcript types remain open for exploration. Only ADA embeddings were tested systematically. Direct evaluations against M³Seg—or more recent hierarchical segmentation baselines—are constrained by code availability.

Proposed future work includes:

  • Systematic embedding model comparisons (SBERT, RoBERTa, LLaMA, etc.)
  • Extending the hierarchical tree for downstream applications: multi-level summarization, chapter labeling, knowledge extraction
  • Application to less-structured, noisier, or conversational transcript domains

6. The BITS-Tree Data Structure (TreeSeg in Segment Storage Context)

In the context of dynamic segment storage and interval queries, TreeSeg refers to the BITS-Tree (Balanced Inorder Threaded Segment Tree) (Easwarakumar et al., 2015). This structure maintains a height-balanced (AVL) binary search tree in which each node stores a non-overlapping interval and the list of original segments containing that interval.

Key characteristics:

  • Insertion/Deletion: O(logn+k)O(\log n + k), where nn is the segment count and kk the number of affected nodes (overlapping with the inserted/deleted segment). Segments can extend beyond previous tree bounds.
  • Query:
    • Point (stabbing) query: O(logn+k)O(\log n + k), with kk as output size.
    • Range query: O(logn+m)O(\log n + m), m=m= number of nodes intersecting query.
  • Node count: At most $2n-1$, much lower than 2(UmaxUmin)12(U_{\max}-U_{\min})-1 for classic dynamic segment trees built over a universe UU.
  • Space: Worst-case total segment-list storage O(n2)O(n^2).
  • Height: O(logn)O(\log n).
  • Threading: Inorder threads enable efficient traversal for range queries.

This structure is particularly advantageous when the global universe UU is large and nn is comparatively small, minimizing node count while supporting efficient dynamic updates and fast queries (Easwarakumar et al., 2015).

7. Relationship to Other Tree-Based Segmentation Techniques

The term “TreeSeg” is also used in TreeSegNet for image segmentation, notably in adaptive CNNs constructed according to class confusion statistics (Yue et al., 2018). Despite methodologically divergent goals, these approaches share the strategic use of tree structures—either for recursively partitioning data (topic segmentation, BITS-Tree) or as an architectural prior in neural segmentation networks (TreeSegNet).

A plausible implication is that tree-based representations enable scalable, multiresolution partitioning or specialization in domains where hierarchical or ambiguous boundaries are intrinsic to the data.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TreeSeg.