Semantic Structural Entropy (SeSE)
- Semantic Structural Entropy (SeSE) is a quantitative measure that captures structural order and semantic uncertainty in language.
- It leverages graph-based and statistical representations to analyze text segments and LLM-generated outputs for topic segmentation and uncertainty estimation.
- Applied in both linguistics and LLM uncertainty quantification, SeSE reveals topic boundaries and flags potential hallucinations in model outputs.
Semantic Structural Entropy (SeSE) refers to a family of quantitative measures grounded in information theory for characterizing structural order and semantic information within language and, more recently, for quantifying semantic uncertainty in LLMs. The concept has been independently formalized in two distinct research programs: one focusing on language universals and the extraction of semantic structure from text (Montemurro et al., 2015), and the other targeting uncertainty quantification for hallucination detection in LLM systems (Zhao et al., 20 Nov 2025). Both approaches leverage graph-based or statistical representations of word distributions, but with fundamentally different aims and mathematical constructions.
1. Foundational Definitions and Mathematical Formulation
In the linguistic information-theoretic tradition, SeSE (denoted as ) is defined through the mutual information between words and their occurrence over segmented intervals of a text. The procedure starts with a text of word tokens and unique word types, partitioned into contiguous segments of size . For each word , the counts per segment are tabulated and used to compute empirical and expected entropies. The SeSE is then
where is the empirical unigram probability, is the segment entropy for , and is the expected entropy under random permutations. This formalism reveals the scale-dependent semantic structure encoded in word distributions (Montemurro et al., 2015).
In uncertainty quantification for LLMs, SeSE is formulated in terms of the structural entropy of semantic graphs induced by sampled responses. Given LLM outputs, a directed, weighted semantic graph is constructed, with edge weights reflecting pairwise entailment probabilities. The SeSE of a graph is then defined as the minimum total entropy over optimal hierarchical (encoding) trees :
where quantifies the uncertainty flow through each tree node, and the tree minimizes this sum subject to a height constraint (Zhao et al., 20 Nov 2025).
2. Methodological Frameworks
Textual Semantics and Universality
The linguistic SeSE framework decomposes entropy into components reflecting lexical frequencies and ordering:
- (Boltzmann entropy) represents the entropy if only word frequencies matter, calculated as
- accounts for all ordering correlations (estimated via universal compression or string-matching).
- The difference quantifies the KL divergence between original and shuffled texts and is empirically universal at bits/word across multiple languages.
, the SeSE proper, is computed by comparing the mutual information in the empirical and randomized segmentations, thereby isolating scale-dependent topical structure (Montemurro et al., 2015).
Semantic Graph-Based Uncertainty in LLMs
The LLM-centric SeSE pipeline includes:
- Sampling multiple LLM outputs and constructing an adaptively sparsified directed semantic graph via pairwise entailment (using, e.g., DeBERTa-v3-large-MNLI).
- Hierarchically clustering the nodes using optimal encoding trees to compress and summarize the semantic space.
- Quantifying structural entropy at both the global semantic space level and at the level of atomic claims by traversing root-to-leaf paths in bipartite response-claim graphs.
- The AS-DSG algorithm supervises sparsification and normalization, ensuring meaningful semantic dependencies are retained and the resulting entropy is minimized, yielding informative uncertainty estimates (Zhao et al., 20 Nov 2025).
3. Interpretive Significance and Universality
In the context of linguistic analysis, serves as a language-independent baseline for structural order, with its constancy across translations and language families suggesting a universal tradeoff: as vocabulary diversity increases, long-range correlations tighten, maintaining at bits/word. directly captures topicality, with maximization revealing characteristic semantic chunk lengths, such as sub-chapter topological units in books.
For LLM uncertainty quantification, a higher SeSE correlates with increased inherent semantic uncertainty—empirically, higher SeSE flags outputs more likely to contain hallucinations. Per-claim SeSE enables claim-level granularity, where claims in the semantic core (low SeSE) are likely factual and peripheral claims (high SeSE) are likely hallucinated. No baseline in this setting explicitly leverages latent semantic graph structure apart from SeSE (Zhao et al., 20 Nov 2025).
4. Algorithmic Procedures
Linguistic SeSE (as in (Montemurro et al., 2015))
- Preprocess and tokenize the text; count total tokens and vocabulary.
- Estimate via universal compression; compute directly from counts.
- For each candidate segment length , partition the text, tabulate per-segment counts, and compute and by analytical approximation or Monte Carlo shuffling.
- Assemble ; select maximizing .
LLM SeSE (as in (Zhao et al., 20 Nov 2025))
- For a given input, sample outputs from the LLM.
- For each pair, compute directed entailment probabilities using an NLI model.
- Sparsify by retaining top- outgoing edges for various ; ensure connectivity and normalization.
- Identify minimizing , returning the corresponding .
- Construct hierarchical encoding trees, computing node entropies and total tree entropy .
- For claim-level SeSE, build response–claim bipartite graphs, find optimal trees, and attribute per-claim entropy.
Table: Key operational distinctions
| SeSE in Linguistics | SeSE in LLM Uncertainty Quantification |
|---|---|
| : mutual info over text segments | : entropy of semantic graph |
| Works over surface word distributions | Operates on sampled LLM semantic outputs |
| Extracts topic boundaries and keywords | Flags semantic uncertainty and hallucinations |
5. Empirical Case Studies
Linguistic Universality
- Across 75 translations from 24 families, is stable in bits/word (mean $3.56$).
- In English-language scientific works: "Opticks" yields bits/word at ; "Origin of Species" achieves $0.53$ bits/word at .
- Informative word lists extracted via SeSE match domain knowledge: e.g., "species," "hybrids," "selection" in Darwin's text.
LLM Uncertainty Quantification
- SeSE outperforms strong baselines (including supervised UQ and KLE) on datasets such as BioASQ, NQ-Open, and TriviaQA.
- Performance measured by AUROC and AURAC; average AUROC boost of over KLE, with over SE/DSE.
- Fine-grained claim-level SeSE enables rejection of individual hallucinated claims, not merely full outputs (Zhao et al., 20 Nov 2025).
6. Limitations and Assumptions
For linguistic applications:
- Stationarity and ergodicity are assumed; real texts may violate these due to topic evolution or narrative shifts.
- Partition granularity (segment size ) requires careful tuning to avoid artifacts from discrete segment transitions.
- Functional words, due to frequency, may pollute keyword extraction unless attenuated.
- Higher-order dependencies (bigrams, trigrams) are not captured unless integrated into estimation.
- Short texts yield poor estimates; the method presumes adequate data for robust statistics.
- The shuffled baseline for may leave residual structure from n-gram preservation artifacts.
For LLM UQ:
- All computations are post hoc, requiring no access to LLM internals or fine-tuning.
- Complexity scales quadratically with samples, but practical is small ().
7. Synthesis and Outlook
Semantic Structural Entropy unifies two rigorous programs: one distilling universal properties of language order and semantic structure, and one formalizing uncertainty in generative models via semantic graphs. In both, SeSE operationalizes the intuition that structure—whether rudimentary or topical, semantic or stochastic—can be quantified, localized, and exploited. This enables language-agnostic comparative linguistics, robust keyword extraction, principled topic segmentation, and, in machine-generated text, principled detection of factual uncertainty and hallucination risk (Montemurro et al., 2015, Zhao et al., 20 Nov 2025).
A plausible implication is that as semantic spaces—textual or model-generated—grow ever more complex, structural entropy-based diagnostics such as SeSE will enable scale-adaptive semantic profiling, robust uncertainty quantification, and fine-grained semantic filtering. The universality and modularity of the framework suggest broad applicability for both linguistic analysis and machine intelligence.