Papers
Topics
Authors
Recent
2000 character limit reached

Semantic Structural Entropy (SeSE)

Updated 27 November 2025
  • Semantic Structural Entropy (SeSE) is a quantitative measure that captures structural order and semantic uncertainty in language.
  • It leverages graph-based and statistical representations to analyze text segments and LLM-generated outputs for topic segmentation and uncertainty estimation.
  • Applied in both linguistics and LLM uncertainty quantification, SeSE reveals topic boundaries and flags potential hallucinations in model outputs.

Semantic Structural Entropy (SeSE) refers to a family of quantitative measures grounded in information theory for characterizing structural order and semantic information within language and, more recently, for quantifying semantic uncertainty in LLMs. The concept has been independently formalized in two distinct research programs: one focusing on language universals and the extraction of semantic structure from text (Montemurro et al., 2015), and the other targeting uncertainty quantification for hallucination detection in LLM systems (Zhao et al., 20 Nov 2025). Both approaches leverage graph-based or statistical representations of word distributions, but with fundamentally different aims and mathematical constructions.

1. Foundational Definitions and Mathematical Formulation

In the linguistic information-theoretic tradition, SeSE (denoted as ΔI(s)\Delta I(s)) is defined through the mutual information between words and their occurrence over segmented intervals of a text. The procedure starts with a text of NN word tokens and KK unique word types, partitioned into PP contiguous segments of size s=N/Ps = N/P. For each word ww, the counts per segment are tabulated and used to compute empirical and expected entropies. The SeSE is then

ΔI(s)=w=1Kp(w)[H^(Jw)H(Jw)],\Delta I(s) = \sum_{w=1}^{K} p(w) \left[ \langle \hat{H}(J|w) \rangle - H(J|w) \right],

where p(w)p(w) is the empirical unigram probability, H(Jw)H(J|w) is the segment entropy for ww, and H^(Jw)\langle \hat{H}(J|w) \rangle is the expected entropy under random permutations. This formalism reveals the scale-dependent semantic structure encoded in word distributions (Montemurro et al., 2015).

In uncertainty quantification for LLMs, SeSE is formulated in terms of the structural entropy of semantic graphs induced by sampled responses. Given NN LLM outputs, a directed, weighted semantic graph Gdir=(V,E,W)G_{\rm dir}=(V,E,W) is constructed, with edge weights reflecting pairwise entailment probabilities. The SeSE of a graph is then defined as the minimum total entropy over optimal hierarchical (encoding) trees T\mathcal{T}:

SeSE(G)=HT(G)=αλH(G;α),\mathrm{SeSE}(G') = H^{\mathcal{T}^*}(G') = \sum_{\alpha \neq \lambda} H(G';\alpha),

where H(G;α)=gαvol(G)log2gαvol(G)H(G';\alpha) = -\frac{g_\alpha}{\mathrm{vol}(G')} \log_2 \frac{g_\alpha}{\mathrm{vol}(G')} quantifies the uncertainty flow through each tree node, and the tree T\mathcal{T}^* minimizes this sum subject to a height constraint (Zhao et al., 20 Nov 2025).

2. Methodological Frameworks

Textual Semantics and Universality

The linguistic SeSE framework decomposes entropy into components reflecting lexical frequencies and ordering:

  • HsH_s (Boltzmann entropy) represents the entropy if only word frequencies matter, calculated as

Hs=1N[log2N!j=1Klog2nj!]H_s = \frac{1}{N} [ \log_2 N! - \sum_{j=1}^K \log_2 n_j! ]

  • HH accounts for all ordering correlations (estimated via universal compression or string-matching).
  • The difference Ds=HsHD_s = H_s - H quantifies the KL divergence between original and shuffled texts and is empirically universal at 3.5\approx 3.5 bits/word across multiple languages.

ΔI(s)\Delta I(s), the SeSE proper, is computed by comparing the mutual information in the empirical and randomized segmentations, thereby isolating scale-dependent topical structure (Montemurro et al., 2015).

Semantic Graph-Based Uncertainty in LLMs

The LLM-centric SeSE pipeline includes:

  • Sampling multiple LLM outputs and constructing an adaptively sparsified directed semantic graph via pairwise entailment (using, e.g., DeBERTa-v3-large-MNLI).
  • Hierarchically clustering the nodes using optimal encoding trees to compress and summarize the semantic space.
  • Quantifying structural entropy at both the global semantic space level and at the level of atomic claims by traversing root-to-leaf paths in bipartite response-claim graphs.
  • The AS-DSG algorithm supervises sparsification and normalization, ensuring meaningful semantic dependencies are retained and the resulting entropy is minimized, yielding informative uncertainty estimates (Zhao et al., 20 Nov 2025).

3. Interpretive Significance and Universality

In the context of linguistic analysis, DsD_s serves as a language-independent baseline for structural order, with its constancy across translations and language families suggesting a universal tradeoff: as vocabulary diversity increases, long-range correlations tighten, maintaining DsD_s at 3.5\approx 3.5 bits/word. ΔI(s)\Delta I(s) directly captures topicality, with maximization revealing characteristic semantic chunk lengths, such as sub-chapter topological units in books.

For LLM uncertainty quantification, a higher SeSE correlates with increased inherent semantic uncertainty—empirically, higher SeSE flags outputs more likely to contain hallucinations. Per-claim SeSE enables claim-level granularity, where claims in the semantic core (low SeSE) are likely factual and peripheral claims (high SeSE) are likely hallucinated. No baseline in this setting explicitly leverages latent semantic graph structure apart from SeSE (Zhao et al., 20 Nov 2025).

4. Algorithmic Procedures

  1. Preprocess and tokenize the text; count total tokens and vocabulary.
  2. Estimate HH via universal compression; compute HsH_s directly from counts.
  3. For each candidate segment length ss, partition the text, tabulate per-segment counts, and compute H(Jw)H(J|w) and H^(Jw)\langle \hat{H}(J|w) \rangle by analytical approximation or Monte Carlo shuffling.
  4. Assemble ΔI(s)\Delta I(s); select ss^\ast maximizing ΔI(s)\Delta I(s).
  1. For a given input, sample NN outputs from the LLM.
  2. For each pair, compute directed entailment probabilities using an NLI model.
  3. Sparsify by retaining top-kk outgoing edges for various kk; ensure connectivity and normalization.
  4. Identify kk^* minimizing H1(Gk)H^1(G_k), returning the corresponding GdirG^*_{\rm dir}.
  5. Construct hierarchical encoding trees, computing node entropies and total tree entropy HT(G)H^\mathcal{T}(G').
  6. For claim-level SeSE, build response–claim bipartite graphs, find optimal trees, and attribute per-claim entropy.

Table: Key operational distinctions

SeSE in Linguistics SeSE in LLM Uncertainty Quantification
ΔI(s)\Delta I(s): mutual info over text segments HT(G)H^{\mathcal{T}^*}(G'): entropy of semantic graph
Works over surface word distributions Operates on sampled LLM semantic outputs
Extracts topic boundaries and keywords Flags semantic uncertainty and hallucinations

5. Empirical Case Studies

Linguistic Universality

  • Across 75 translations from 24 families, DsD_s is stable in [2.9,4.0][2.9, 4.0] bits/word (mean $3.56$).
  • In English-language scientific works: "Opticks" yields ΔI(s)0.47\Delta I(s^*) \approx 0.47 bits/word at s950s^* \approx 950; "Origin of Species" achieves $0.53$ bits/word at s1930s^* \approx 1930.
  • Informative word lists extracted via SeSE match domain knowledge: e.g., "species," "hybrids," "selection" in Darwin's text.

LLM Uncertainty Quantification

  • SeSE outperforms strong baselines (including supervised UQ and KLE) on datasets such as BioASQ, NQ-Open, and TriviaQA.
  • Performance measured by AUROC and AURAC; average AUROC boost of +3.5%+3.5\% over KLE, with +815%+8\textrm{–}15\% over SE/DSE.
  • Fine-grained claim-level SeSE enables rejection of individual hallucinated claims, not merely full outputs (Zhao et al., 20 Nov 2025).

6. Limitations and Assumptions

For linguistic applications:

  • Stationarity and ergodicity are assumed; real texts may violate these due to topic evolution or narrative shifts.
  • Partition granularity (segment size ss) requires careful tuning to avoid artifacts from discrete segment transitions.
  • Functional words, due to frequency, may pollute keyword extraction unless attenuated.
  • Higher-order dependencies (bigrams, trigrams) are not captured unless integrated into HH estimation.
  • Short texts yield poor estimates; the method presumes adequate data for robust statistics.
  • The shuffled baseline for H^(Jw)\langle \hat{H}(J|w) \rangle may leave residual structure from n-gram preservation artifacts.

For LLM UQ:

  • All computations are post hoc, requiring no access to LLM internals or fine-tuning.
  • Complexity scales quadratically with NN samples, but practical NN is small (10\approx 10).

7. Synthesis and Outlook

Semantic Structural Entropy unifies two rigorous programs: one distilling universal properties of language order and semantic structure, and one formalizing uncertainty in generative models via semantic graphs. In both, SeSE operationalizes the intuition that structure—whether rudimentary or topical, semantic or stochastic—can be quantified, localized, and exploited. This enables language-agnostic comparative linguistics, robust keyword extraction, principled topic segmentation, and, in machine-generated text, principled detection of factual uncertainty and hallucination risk (Montemurro et al., 2015, Zhao et al., 20 Nov 2025).

A plausible implication is that as semantic spaces—textual or model-generated—grow ever more complex, structural entropy-based diagnostics such as SeSE will enable scale-adaptive semantic profiling, robust uncertainty quantification, and fine-grained semantic filtering. The universality and modularity of the framework suggest broad applicability for both linguistic analysis and machine intelligence.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Semantic Structural Entropy (SeSE).