Semantic Structural Entropy (SeSE)

Updated 27 November 2025

Semantic Structural Entropy (SeSE) is a quantitative measure that captures structural order and semantic uncertainty in language.
It leverages graph-based and statistical representations to analyze text segments and LLM-generated outputs for topic segmentation and uncertainty estimation.
Applied in both linguistics and LLM uncertainty quantification, SeSE reveals topic boundaries and flags potential hallucinations in model outputs.

Semantic Structural Entropy (SeSE) refers to a family of quantitative measures grounded in information theory for characterizing structural order and semantic information within language and, more recently, for quantifying semantic uncertainty in LLMs. The concept has been independently formalized in two distinct research programs: one focusing on language universals and the extraction of semantic structure from text (Montemurro et al., 2015), and the other targeting uncertainty quantification for hallucination detection in LLM systems (Zhao et al., 20 Nov 2025). Both approaches leverage graph-based or statistical representations of word distributions, but with fundamentally different aims and mathematical constructions.

1. Foundational Definitions and Mathematical Formulation

In the linguistic information-theoretic tradition, SeSE (denoted as $\Delta I(s)$ ) is defined through the mutual information between words and their occurrence over segmented intervals of a text. The procedure starts with a text of $N$ word tokens and $K$ unique word types, partitioned into $P$ contiguous segments of size $s = N/P$ . For each word $w$ , the counts per segment are tabulated and used to compute empirical and expected entropies. The SeSE is then

$\Delta I(s) = \sum_{w=1}^{K} p(w) \left[ \langle \hat{H}(J|w) \rangle - H(J|w) \right],$

where $p(w)$ is the empirical unigram probability, $H(J|w)$ is the segment entropy for $w$ , and $\langle \hat{H}(J|w) \rangle$ is the expected entropy under random permutations. This formalism reveals the scale-dependent semantic structure encoded in word distributions (Montemurro et al., 2015).

In uncertainty quantification for LLMs, SeSE is formulated in terms of the structural entropy of semantic graphs induced by sampled responses. Given $N$ LLM outputs, a directed, weighted semantic graph $G_{\rm dir}=(V,E,W)$ is constructed, with edge weights reflecting pairwise entailment probabilities. The SeSE of a graph is then defined as the minimum total entropy over optimal hierarchical (encoding) trees $\mathcal{T}$ :

$\mathrm{SeSE}(G') = H^{\mathcal{T}^*}(G') = \sum_{\alpha \neq \lambda} H(G';\alpha),$

where $H(G';\alpha) = -\frac{g_\alpha}{\mathrm{vol}(G')} \log_2 \frac{g_\alpha}{\mathrm{vol}(G')}$ quantifies the uncertainty flow through each tree node, and the tree $\mathcal{T}^*$ minimizes this sum subject to a height constraint (Zhao et al., 20 Nov 2025).

2. Methodological Frameworks

Textual Semantics and Universality

The linguistic SeSE framework decomposes entropy into components reflecting lexical frequencies and ordering:

$H_s$ (Boltzmann entropy) represents the entropy if only word frequencies matter, calculated as

$H_s = \frac{1}{N} [ \log_2 N! - \sum_{j=1}^K \log_2 n_j! ]$

$H$ accounts for all ordering correlations (estimated via universal compression or string-matching).
The difference $D_s = H_s - H$ quantifies the KL divergence between original and shuffled texts and is empirically universal at $\approx 3.5$ bits/word across multiple languages.

$\Delta I(s)$ , the SeSE proper, is computed by comparing the mutual information in the empirical and randomized segmentations, thereby isolating scale-dependent topical structure (Montemurro et al., 2015).

Semantic Graph-Based Uncertainty in LLMs

The LLM-centric SeSE pipeline includes:

Sampling multiple LLM outputs and constructing an adaptively sparsified directed semantic graph via pairwise entailment (using, e.g., DeBERTa-v3-large-MNLI).
Hierarchically clustering the nodes using optimal encoding trees to compress and summarize the semantic space.
Quantifying structural entropy at both the global semantic space level and at the level of atomic claims by traversing root-to-leaf paths in bipartite response-claim graphs.
The AS-DSG algorithm supervises sparsification and normalization, ensuring meaningful semantic dependencies are retained and the resulting entropy is minimized, yielding informative uncertainty estimates (Zhao et al., 20 Nov 2025).

3. Interpretive Significance and Universality

In the context of linguistic analysis, $D_s$ serves as a language-independent baseline for structural order, with its constancy across translations and language families suggesting a universal tradeoff: as vocabulary diversity increases, long-range correlations tighten, maintaining $D_s$ at $\approx 3.5$ bits/word. $\Delta I(s)$ directly captures topicality, with maximization revealing characteristic semantic chunk lengths, such as sub-chapter topological units in books.

For LLM uncertainty quantification, a higher SeSE correlates with increased inherent semantic uncertainty—empirically, higher SeSE flags outputs more likely to contain hallucinations. Per-claim SeSE enables claim-level granularity, where claims in the semantic core (low SeSE) are likely factual and peripheral claims (high SeSE) are likely hallucinated. No baseline in this setting explicitly leverages latent semantic graph structure apart from SeSE (Zhao et al., 20 Nov 2025).

4. Algorithmic Procedures

Preprocess and tokenize the text; count total tokens and vocabulary.
Estimate $H$ via universal compression; compute $H_s$ directly from counts.
For each candidate segment length $s$ , partition the text, tabulate per-segment counts, and compute $H(J|w)$ and $\langle \hat{H}(J|w) \rangle$ by analytical approximation or Monte Carlo shuffling.
Assemble $\Delta I(s)$ ; select $s^\ast$ maximizing $\Delta I(s)$ .

For a given input, sample $N$ outputs from the LLM.
For each pair, compute directed entailment probabilities using an NLI model.
Sparsify by retaining top- $k$ outgoing edges for various $k$ ; ensure connectivity and normalization.
Identify $k^*$ minimizing $H^1(G_k)$ , returning the corresponding $G^*_{\rm dir}$ .
Construct hierarchical encoding trees, computing node entropies and total tree entropy $H^\mathcal{T}(G')$ .
For claim-level SeSE, build response–claim bipartite graphs, find optimal trees, and attribute per-claim entropy.

Table: Key operational distinctions

SeSE in Linguistics	SeSE in LLM Uncertainty Quantification
$\Delta I(s)$ : mutual info over text segments	$H^{\mathcal{T}^*}(G')$ : entropy of semantic graph
Works over surface word distributions	Operates on sampled LLM semantic outputs
Extracts topic boundaries and keywords	Flags semantic uncertainty and hallucinations

5. Empirical Case Studies

Linguistic Universality

Across 75 translations from 24 families, $D_s$ is stable in $[2.9, 4.0]$ bits/word (mean $3.56$).
In English-language scientific works: "Opticks" yields $\Delta I(s^*) \approx 0.47$ bits/word at $s^* \approx 950$ ; "Origin of Species" achieves $0.53$ bits/word at $s^* \approx 1930$ .
Informative word lists extracted via SeSE match domain knowledge: e.g., "species," "hybrids," "selection" in Darwin's text.

LLM Uncertainty Quantification

SeSE outperforms strong baselines (including supervised UQ and KLE) on datasets such as BioASQ, NQ-Open, and TriviaQA.
Performance measured by AUROC and AURAC; average AUROC boost of $+3.5\%$ over KLE, with $+8\textrm{–}15\%$ over SE/DSE.
Fine-grained claim-level SeSE enables rejection of individual hallucinated claims, not merely full outputs (Zhao et al., 20 Nov 2025).

6. Limitations and Assumptions

For linguistic applications:

Stationarity and ergodicity are assumed; real texts may violate these due to topic evolution or narrative shifts.
Partition granularity (segment size $s$ ) requires careful tuning to avoid artifacts from discrete segment transitions.
Functional words, due to frequency, may pollute keyword extraction unless attenuated.
Higher-order dependencies (bigrams, trigrams) are not captured unless integrated into $H$ estimation.
Short texts yield poor estimates; the method presumes adequate data for robust statistics.
The shuffled baseline for $\langle \hat{H}(J|w) \rangle$ may leave residual structure from n-gram preservation artifacts.

For LLM UQ:

All computations are post hoc, requiring no access to LLM internals or fine-tuning.
Complexity scales quadratically with $N$ samples, but practical $N$ is small ( $\approx 10$ ).

7. Synthesis and Outlook

Semantic Structural Entropy unifies two rigorous programs: one distilling universal properties of language order and semantic structure, and one formalizing uncertainty in generative models via semantic graphs. In both, SeSE operationalizes the intuition that structure—whether rudimentary or topical, semantic or stochastic—can be quantified, localized, and exploited. This enables language-agnostic comparative linguistics, robust keyword extraction, principled topic segmentation, and, in machine-generated text, principled detection of factual uncertainty and hallucination risk (Montemurro et al., 2015, Zhao et al., 20 Nov 2025).

A plausible implication is that as semantic spaces—textual or model-generated—grow ever more complex, structural entropy-based diagnostics such as SeSE will enable scale-adaptive semantic profiling, robust uncertainty quantification, and fine-grained semantic filtering. The universality and modularity of the framework suggest broad applicability for both linguistic analysis and machine intelligence.

PDF Markdown Chat (Pro)

References (2)

Complexity and universality in the long-range order of words (2015)

SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Semantic Structural Entropy (SeSE).

Semantic Structural Entropy (SeSE)

1. Foundational Definitions and Mathematical Formulation

2. Methodological Frameworks

Textual Semantics and Universality

Semantic Graph-Based Uncertainty in LLMs

3. Interpretive Significance and Universality

4. Algorithmic Procedures

Linguistic SeSE (as in (Montemurro et al., 2015))

LLM SeSE (as in (Zhao et al., 20 Nov 2025))

5. Empirical Case Studies

Linguistic Universality

LLM Uncertainty Quantification

6. Limitations and Assumptions

7. Synthesis and Outlook

Whiteboard

Follow Topic

Continue Learning

Semantic Structural Entropy (SeSE)

1. Foundational Definitions and Mathematical Formulation

2. Methodological Frameworks

Textual Semantics and Universality

Semantic Graph-Based Uncertainty in LLMs

3. Interpretive Significance and Universality

4. Algorithmic Procedures

Linguistic SeSE (as in (Montemurro et al., 2015))

LLM SeSE (as in (Zhao et al., 20 Nov 2025))

5. Empirical Case Studies

Linguistic Universality

LLM Uncertainty Quantification

6. Limitations and Assumptions

7. Synthesis and Outlook

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics