Semantic–Structural Entropy (S²-Entropy)

Updated 16 March 2026

Semantic–Structural Entropy (S²-Entropy) is a unified metric that quantifies the interplay between local semantic diversity and global structural coherence across diverse domains.
Its formulation leverages rigorous tools from information geometry and recursive segmentation to encode time series and natural language into symbolic representations with measurable entropy.
Empirical findings show its practical utility in areas such as EEG seizure detection, language model evaluation, and adaptive network analysis, outperforming traditional metrics.

Semantic–Structural Entropy (S²-Entropy) quantifies the interplay and balance between semantic information and structural organization in complex systems such as time series, natural language, and dynamic networks. S²-Entropy is framework-agnostic, with rigorous instantiations in information geometry, linguistic structure, and network science. It unifies measurements of local patterning (“semantic diversity”) and global arrangement (“structural coherence”) into a single metric, enabling principled characterization of complexity, discovery, and synchronization phenomena across domains (Majumdar et al., 2018, Zhong et al., 13 Feb 2026, Buehler, 24 Mar 2025, Montemurro et al., 2015).

1. Geometric Encoding and S²-Entropy in Time Series

A discrete time series $s[n]$ can be modeled as the trajectory of a unit-mass particle in a force field, encoding information in its local geometric structure. At each time step, three-point neighborhoods are classified via the signs of finite-difference-based P-operators:

Backward difference: $s'[n]=s[n]-s[n-1]$
Forward difference: $s'[n+1]=s[n+1]-s[n]$
Second difference: $s''[n]=s'[n+1]-s'[n]$

Defining left and right P-operators as $P(s[n]^-)=s''[n] s'[n]$ and $P(s[n]^+)=s''[n] s'[n+1]$ , one enumerates all possible patterns generated by sign changes. Under strict sign-hierarchy constraints, exactly 13 geometric configurations occur in discrete time (Theorem 1 and Lemmas 1–4), each mapped to a unique symbol—yielding a lossless, symbolic representation of the original series over a 13-symbol alphabet (Majumdar et al., 2018).

For a sequence encoded over $N$ points, let $n_i$ be the count of configuration $i$ , $p_i = n_i/(N-2)$ . The semantic entropy is defined as:

$E = -\sum_{i=1}^{13} p_i\,\log_2 p_i$

which measures the diversity of local shapes.

In analogy to Newtonian mechanics, information power $P$ reflects the mean absolute value of the P-operator, quantifying the “structural intensity” or curvature:

$P = \frac{1}{N-2}\sum_{n=2}^{N-1} |s''[n] s'[n]|$

The ratio S²-Entropy is then:

$\mathrm{S}^2\text{-Entropy} = \frac{E}{P}$

Low values signal highly regular yet “energetic” dynamics (synchronous states, e.g. epileptic seizures), whereas high values mark irregular, diverse configurations (Majumdar et al., 2018).

2. S²-Entropy in Semantic Hierarchies of Natural Language

S²-Entropy in linguistic settings characterizes how the hierarchical, multiscale organization of meaning constrains token-level unpredictability. Consider a corpus segmented recursively into semantically coherent chunks via a self-similar K-ary fragmentation model. For a text of $N$ tokens, construct a rooted tree $T$ where each node at level $\ell$ has size $\mu_{\ell;i}$ , and employ a uniform splitting kernel:

$Z_K(n) = \binom{n+K-1}{n}$

with associated probabilities,

$P(T) = \prod_{\ell=1}^L \prod_{i=1}^{K^{\ell-1}} Z_K(\mu_{\ell;i})^{-1}$

The overall entropy for such a hierarchical semantic ensemble is extensive for large $N$ :

$H(N) \simeq h_K N$

where $h_K$ is the S²-Entropy rate, depending only on $K$ (the semantic complexity, or maximal branching factor). For $K=2$ , $h_2 \approx 0.807$ nats/token; for large $K$ ,

$h_K = \tfrac{1}{2}(\ln K)^2 + (1 + \gamma)\ln K + \pi^2/12 - \ln 2 + O\left(\tfrac{\ln K}{K}\right)$

( $\gamma$ is Euler’s constant) (Zhong et al., 13 Feb 2026).

Selecting $K$ to match empirical chunk-size distributions allows precise prediction of per-token entropy rates for corpora, matching LLM cross-entropy rates across genres (e.g., $K^*=2$ for simple stories, $K^*=6$ for poetry). S²-Entropy thus demonstrates that much of the apparent redundancy in natural language emerges from multiscale, recursively organized semantic chunks, and that the entropy rate systematically increases with semantic complexity (Zhong et al., 13 Feb 2026).

3. S²-Entropy in Word-Distributional Structure

A complementary formulation of S²-Entropy arises from measuring the degree to which word distributions are structured across contexts. Partitioning a text of $N$ tokens into $P$ equal-sized contiguous regions ( $s=N/P$ per context), the mutual information between word-type and context is

$M(J,W) = \sum_{k=1}^K p(w_k) \sum_{j=1}^P p(j|w_k) \log_2\left[\frac{p(j|w_k)}{p(j)}\right]$

where $p(w_k) = n_k/N$ , $p(j|w_k) = n_{k,j}/n_k$ , and $p(j) = 1/P$ .

To correct for sampling noise, subtract the baseline computed from random surrogates:

$\Delta I(s) = M(J,W) - \langle \hat{M}(J,W) \rangle$

This S²-Entropy $\Delta I(s)$ measures, for each word, the reduction in conditional entropy compared to a uniform distribution,

$\Delta I(s) = \sum_{k=1}^K p(w_k) \left[\langle \hat{H}(J|w_k) \rangle - H(J|w_k) \right]$

where $H(J|w_k) = -\sum_j p(j|w_k) \log_2 p(j|w_k)$ (Montemurro et al., 2015).

$\Delta I(s)$ exhibits a universal maximum at an intermediate scale $s^* \sim 500-2000$ words and identifies keywords and semantic domains without prior linguistic annotation. Its nonzero value across languages and genres indicates the deep, scale-dependent structuring of semantic information in texts (Montemurro et al., 2015).

4. Semantic–Structural Interplay in Graph Reasoning and Adaptive Networks

S²-Entropy generalizes to dynamic networks. Structural entropy is captured by the Von Neumann graph entropy:

$S_{\mathrm{struct}} = -\mathrm{Tr}(\rho \ln \rho)$

where $\rho$ is the normalized Laplacian density matrix.

Semantic entropy is similarly defined, but replacing adjacency with embedding-derived inner products (e.g., cosine similarities of Sentence-BERT vectors), leading to a semantic Laplacian and $\rho^{(\mathrm{sem})}$ . The two entropies are combined as a dimensionless interplay metric:

$\mathcal{D} = \frac{S_{\mathrm{struct}} - S_{\mathrm{sem}}}{S_{\mathrm{struct}} + S_{\mathrm{sem}}}$

The sign and magnitude of $\mathcal{D}$ indicate the dominance of structural versus semantic complexity.

Empirically, agentic graph-reasoning systems dynamically evolve to a critical discovery state where $\mathcal{D}_\infty \approx -0.03$ (semantic entropy exceeds structure by $\approx 6\%$ ) and a stable fraction ( $\sim 12\%$ ) of “surprising” edges (i.e., structurally valid but semantically distant links) persists. This regime supports ongoing exploration and self-organized criticality, balancing innovation and coherence (Buehler, 24 Mar 2025).

5. Applications and Empirical Findings

S²-Entropy has demonstrated practical utility across temporal, linguistic, and network domains:

In intracranial EEG, the $E/P$ ratio serves as a sensitive biomarker of epileptic seizures. During seizures, semantic entropy $E$ decreases while information power $P$ increases, yielding a pronounced minimum of S²-Entropy ( $E/P$ ) in the ictal state in $72/87$ seizures, outperforming permutation entropy and spectral power as a synchrony index (Majumdar et al., 2018).
In natural language, S²-Entropy rates derived from semantic chunking align with LLM per-token cross-entropy across diverse corpora (stories, scientific abstracts, poetry). Variations in $K$ capture genre-dependent differences in redundancy and semantic complexity (Zhong et al., 13 Feb 2026).
In textual analysis, $\Delta I(s)$ extracted from word-to-context mutual information robustly identifies theme-changing segments and keywords; its universality affirms the self-organized structuring of information in language (Montemurro et al., 2015).
In knowledge networks, monitoring S²-Entropy-informed metrics such as $\mathcal{D}$ and the fraction of surprising edges provides actionable guidance for sustaining long-term discovery and preventing stasis or semantic collapse (Buehler, 24 Mar 2025).

6. Theoretical Foundations and Extensions

The S²-Entropy framework rests on rigorous mathematical arguments:

For time series, Theorems 1–2 and Lemmas 1–4 precisely delimit admissible local geometric configurations and underpin the encoding procedure.
In semantic chunking models for language, maximal entropy hierarchies are analytically tractable (e.g., via recursive partition functions and saddle-point methods) allowing for closed-form or numerical determination of entropy rates and their corpus dependence (Zhong et al., 13 Feb 2026).
Graph measures leverage spectral theory of Laplacians for both structural and embedding-induced semantics, with critical discovery formalized via dimensionless ratios (Buehler, 24 Mar 2025).
Mutual information-based linguistic measures reduce to frequency-weighted differences of conditional entropies, corrected for sampling via analytical or Monte Carlo approximations (Montemurro et al., 2015).

Extensions of S²-Entropy encompass adaptive priors on segmentation, multimodal data (e.g., music, code), and real-time tracking of innovation in evolving agentic systems. The consistent finding across domains is that semantic richness and structural organization may be formally tracked and delicately balanced by entropy-based indices, serving both explanatory and control-theoretic purposes.

7. Summary Table: S²-Entropy Across Domains

Domain	Semantic Aspect	Structural Aspect	S²-Entropy Formula
Time series	Diversity of local shapes ( $E$ )	Information power ( $P$ )	$E/P$
Natural language	Multiscale semantic chunking	Hierarchical chunk structure	$h_K\text{ (rate)}$
Texts	Word-context clustering	Baseline-corrected distribution	$\Delta I(s)$
Networks	Embedding spread ( $S_{\mathrm{sem}}$ )	Laplacian spectrum ( $S_{\mathrm{struct}}$ )	$\mathcal{D}$

In all cases, S²-Entropy operationalizes the dynamic tension between richness of meaning and organizational structure, offering a unifying analytic lens across temporal, textual, and relational data (Majumdar et al., 2018, Zhong et al., 13 Feb 2026, Buehler, 24 Mar 2025, Montemurro et al., 2015).