Structural Semantic Entropy (SeSE)
- Structural Semantic Entropy (SeSE) is a measure that quantifies uncertainty by analyzing semantic structures and interrelations among model outputs.
- It extends traditional entropy methods by using graph-based estimators to capture hierarchical, directional, and continuous semantic relationships.
- SeSE has practical applications in improving hallucination detection in LLMs and uncertainty quantification in tasks like QA, summarization, and translation.
Structural Semantic Entropy (SeSE) quantifies the uncertainty or diversity of a set of model outputs by measuring the richness of their underlying semantic structural organization. Unlike traditional entropy measures that assess variability in flat or unstructured output spaces, SeSE incorporates latent structural information—including hierarchical, directional, and topological relations—among the outputs, typically modeled via graphs constructed from semantic similarity or entailment. Recent work has demonstrated that SeSE and related graph-theoretic entropy formulations enable more nuanced uncertainty quantification, improved hallucination detection in LLMs, and reveal critical dynamics underlying adaptive reasoning and discovery in autonomous systems (Buehler, 24 Mar 2025, Nguyen et al., 30 May 2025, Zhao et al., 20 Nov 2025).
1. Mathematical Formulation of Structural Semantic Entropy
SeSE fundamentally extends Shannon-style entropy into the domain of structured semantic spaces, employing spectral, tree-based, or kNN-inspired estimators on graphs whose nodes represent model-generated answers and edges encode pairwise semantic relationships.
Core Formalism
Let be a semantic graph constructed from a set of responses or concepts, with the node set, the edges (directed or undirected), and the weights (e.g., entailment probabilities or semantic similarities). Associated with such a graph, several entropy measures have been defined:
- Von Neumann Graph Entropy (structural entropy), undirected (Buehler, 24 Mar 2025):
where are eigenvalues of the normalized Laplacian ; is the adjacency matrix; is the degree matrix.
- Semantic Entropy (SE): Uses clustering or partitioning of responses by semantic similarity, with cluster probabilities either length-normalized or empirical:
where 0 is the normalized mass of cluster 1 (Nguyen et al., 30 May 2025).
- Pairwise/Structural Estimators: Extend SE to continuous similarity metrics using nearest neighbor or LogSumExp forms over all pairs:
2
where 3 is a semantic similarity function and 4 is a temperature (Nguyen et al., 30 May 2025).
- Hierarchical Directed Entropy (SeSE for LLM UQ):
5
where 6 is an adaptively sparsified, directed semantic graph and 7 is the minimized tree-encoded directed graph entropy (Zhao et al., 20 Nov 2025).
These measures aim to capture both the diversity of semantic outputs and the richness of their structural interrelations.
2. Structural Semantic Entropy in LLMs
State-of-the-art SeSE estimators have been employed to quantify uncertainty and hallucination propensity in LLM outputs, surpassing classical entropy and pairwise embedding baselines.
Semantic Graph Construction
- Node set: Sampled LLM outputs (e.g., answers, summaries).
- Edge weights: Computed via NLI-model entailment probabilities or cosine embedding similarities. Directionality encodes semantic dependency.
- Graph sparsification: Adaptive selection of top-8 outgoing edges per node to form minimal but informative graphs, often enforced to be strongly connected Markov chains (Zhao et al., 20 Nov 2025).
SeSE Score Pipeline
- Generate candidate responses.
- Build and sparsify a directed semantic graph 9.
- Compute the optimal encoding tree 0 to hierarchically aggregate related responses, minimizing total entropy.
- Evaluate 1 as structural entropy under 2.
A higher SeSE reflects a structure that is less compressible and hence more uncertain—closely tied to hallucination probability in LLMs.
3. Theoretical Properties and Generalization
Structural Semantic Entropy frameworks unify and generalize prior entropy measures:
- Limit reduction to flat SE/DSE: Special choices of the similarity function or clustering yield classical (discrete or soft) semantic entropy as special cases (Nguyen et al., 30 May 2025).
- Incorporation of hierarchical, directional, and continuous structure: The optimal encoding tree and pairwise similarity matrix allow SeSE to account for both intra-cluster spread and inter-cluster proximity, capturing complexity overlooked by cluster-counting or flat pseudoprobability methods.
- Directed entropy minimization: Enables explicit modeling of entailment directionality, essential for representing semantic asymmetries in argumentation, explanation, or question answering (Zhao et al., 20 Nov 2025).
These properties render SeSE strictly more expressive than earlier flat-structure measures.
4. Empirical Evidence and Applications
Hallucination Detection and Uncertainty Quantification
SeSE and related estimators (SNNE, WSNNE) have produced marked improvements in:
- Binary QA (e.g., SQuAD, TriviaQA, BioASQ): AUROC gains of 6–15% relative to SE and kernel-based UQ, and ~3–5% over the next-strongest methods (Zhao et al., 20 Nov 2025).
- Summarization and translation: Precision–recall ratios (PRR) and BERTScore metrics favor SeSE/SNNE substantially over SE/DSE (Nguyen et al., 30 May 2025).
- Fine-grained claim-level uncertainty in long-form LLM outputs: SeSE enables the ranking of atomic claims by hallucination risk using a semantic-structural perspective.
Continuous Discovery in Agentic Systems
Agentic reasoning systems evolving over semantic graphs exhibit:
- Persistent excess of semantic entropy (3; critical discovery parameter 4) (Buehler, 24 Mar 2025).
- Stable injection of “surprising” (cross-domain) semantic edges, facilitating self-organized criticality and continuous innovation.
- Power-law degree distributions, small-world topology, and negative late-stage cross-correlation between semantic and structural entropy, all signaling adaptation at criticality.
These observations reveal SeSE’s utility as a monitoring signal and design axis for adaptive, open-ended reasoning architectures.
5. Algorithmic and Implementation Aspects
Complexity
- Pairwise methods: 5 for 6 sampled outputs.
- Graph construction and entropy minimization: For typical 7, per-query wall-time is a few seconds on GPU for NLI-based edge computation and hierarchical encoding tree optimization (Zhao et al., 20 Nov 2025).
Core Components
| Component | Method | Typical Techniques |
|---|---|---|
| Semantic Similarity | Cosine, NLI entailment | Sentence transformers, DeBERTa-v3-mnli |
| Graph Sparsification | Top-8 selection | Adaptive entropy minimization to choose 9 |
| Tree Encoding Optimization | Greedy merging | Greedy coupling, minimizing hierarchical directed entropy |
Parameter Choices
- Sampling: 0 at 1, one greedy decode at 2.
- Tree height: 3–4 sufficient in practice.
- NLI model: DeBERTa-v3-large-MNLI for entailment computations.
6. Broader Significance and Criticality Interpretation
SeSE measures offer principled, theoretically grounded metrics for:
- Quantitative analysis of semantic “disorder” and information compressibility in model outputs.
- Detection and prevention of hallucination, promoting abstention in high-uncertainty regimes.
- Steering generative agents toward sustained innovation and criticality, leveraging entropy balance between structure and meaning (Buehler, 24 Mar 2025).
A pivotal implication is that targeting a regime where semantic entropy persistently exceeds structural entropy may underlie model adaptability and robust discovery. Practical reinforcement objectives can directly incorporate the critical discovery parameter 5, semantic entropy, and “surprising edge” rate to maintain exploratory capacity.
7. Connections and Distinctions among SeSE Frameworks
Three recent SeSE-related formulations emphasize complementary facets:
- Graph-theoretic SeSE (Buehler, 24 Mar 2025): Focus on structural vs. semantic entropy via graph Laplacians, criticality, and complex system parallels.
- Pairwise/nearest-neighbor SNNE/WSNNE (Nguyen et al., 30 May 2025): Operationalize structural semantic entropy in LLM UQ by continuous pairwise similarity aggregation, generalizing cluster-based approaches.
- Hierarchical directed SeSE for LLMs (Zhao et al., 20 Nov 2025): Capture directional, compressible structure in generated outputs; proves superior for fine-grained hallucination detection.
A plausible implication is that unified SeSE concepts can inform cross-disciplinary investigation—linking natural language generation, complex adaptive system theory, and scalable uncertainty quantification under a common structural-semantic information-theoretic lens.