Permutation Entropy: Theory and Practice

Updated 29 December 2025

Permutation entropy is a nonparametric measure that quantifies time series complexity by analyzing the ordinal patterns from sequences of delayed values.
It relies on key parameters—embedding dimension and time delay—to capture underlying dynamics, with invariance to monotonic transformations and efficient computation.
Advanced methods like Bayesian and bootstrap techniques enable uncertainty quantification, enhancing its applicability in nonlinear dynamics, neuroscience, and finance.

Permutation entropy (PE) is a nonparametric, ordinal-pattern-based measure of time-series complexity introduced by Bandt and Pompe (2002). It quantifies the diversity of the orderings (permutations) of consecutive or delayed values in a scalar sequence, thus characterizing the degree of disorder or structure present in the underlying dynamics. PE's invariance to monotonic transformations and robustness to observational noise, together with efficient computation and interpretability, has led to wide adoption across disciplines including nonlinear dynamics, neuroscience, communications, finance, and statistical physics.

1. Mathematical Definition and Computation

Given a real-valued time series $\{x_t\}_{t=1}^N$ , select an embedding dimension %%%%1%%%% and time delay $\tau\geq1$ . For each valid index $t$ ( $1\leq t \leq N-(m-1)\tau$ ), form the vector

$\mathbf{v}_t = (x_t,\,x_{t+\tau},\,\ldots,\,x_{t+(m-1)\tau})\,.$

Associate to $\mathbf{v}_t$ the unique permutation $\pi$ of $\{0,\dots,m-1\}$ that sorts its entries in ascending order, using index order to break ties if necessary. The set of all such $\pi$ as $t$ runs defines the empirical distribution $p(\pi)$ over $m!$ ordinal patterns:

$p(\pi) = \frac{\#\text{ of times pattern }\pi\text{ appears}}{N-(m-1)\tau}\,.$

The permutation entropy is then

$H_{PE}(m,\tau) = -\sum_{\pi\in S_m} p(\pi)\log p(\pi)\,,$

with normalization

$\widehat{H}_{PE}(m, \tau) = \frac{H_{PE}(m,\tau)}{\log(m!)} \in [0,1]\,.$

Low values indicate regularity (few patterns dominate), high values signal randomness (patterns near-uniform) (Kay et al., 2023, Edthofer et al., 2023, Mitroi-Symeonidis et al., 2019, Gancio et al., 1 Aug 2025, Fuentealba et al., 23 Oct 2024).

Basic algorithm:

For each $t$ , assemble $\mathbf{v}_t$ , determine its ordinal pattern $\pi_t$ .
Build the histogram of all patterns to estimate $p(\pi)$ .
Compute $H_{PE}$ as above.

Inversion under strictly monotonic transformations of $x_t$ does not affect PE (Kay et al., 2023, Fabila-Carrasco et al., 2021).

2. Parameter Selection: Embedding Dimension and Delay

PE depends critically on $(m,\tau)$ :

Embedding dimension ( $m$ ): Controls the pattern length and sensitivity to temporal structure. Larger $m$ captures more complex dynamics but requires much more data ( $N\gg m!$ ). Recommended $3\le m \le 7$ in most domains (Mitroi-Symeonidis et al., 2019, Edthofer et al., 2023, Myers et al., 2019, Myers et al., 2019).
Time delay ( $\tau$ ): Sets the sampling interval in each pattern. Small $\tau$ $τ$ can cause redundancy; large $\tau$ $τ$ can lose relevant dynamics. Selection strategies include:
- First minimum of the auto-mutual information or autocorrelation (Myers et al., 2019).
- Frequency-domain approaches (e.g., least median of squares with Fourier spectral cutoff) (Myers et al., 2019).
- Persistent homology and topological data analysis to identify attractor reconstruction and periodic orbits (Myers et al., 2019).
- Multi-scale PE sweeps across $\tau$ to maximize entropy or highlight specific features (Myers et al., 2019).

Parameter selection is often application-specific, but automatic methods now exist and are recommended for objectivity and reproducibility.

Parameter	Typical Range	Selection Methods
Embedding $m$	$3$–$7$	FNN, MPE, pattern-use statistics, max of $h'_n$
Delay $\tau$	$1$–system scale	First min auto-mutual info, frequency methods, TDA

3. Statistical Estimation and Uncertainty Quantification

The standard "plug-in" estimator using empirical $p(\pi)$ is asymptotically consistent but can be biased for moderate $N$ , especially when $N \not\gg m!$ . Two rigorous approaches address accuracy and uncertainty:

Bayesian Multinomial–Dirichlet Model: Assume pattern counts are drawn from a multinomial with unknown probabilities $P_i$ , place a conjugate Dirichlet prior, and update to obtain a Dirichlet posterior. The posterior over $H$ can be well-approximated by a Beta distribution matching the analytically derived posterior mean and variance:

$H \sim \mathrm{Beta}(a,b)\,,\quad a = \mu[\mu(1-\mu)/\sigma^2-1],\quad b = (1-\mu)[\mu(1-\mu)/\sigma^2-1]$

where mean $\mu$ and variance $\sigma^2$ are computable from polygamma functions of the updated counts (Little et al., 2021).

Bootstrap/MCMC Approaches: Model the ordinal pattern sequence as a Markov chain (possibly i.i.d.), simulate symbolic surrogates, and recompute PE for each. Confidence intervals and hypothesis testing are supported by the empirical distribution of PE across surrogates (Traversaro et al., 2017).

Bayesian methods naturally incorporate uncertainty and prior information, allow finite-sample inference even when many patterns have count zero, and generalize frequentist bias corrections (Little et al., 2021).

4. Variants, Extensions, and Generalizations

Dimension and Domain Extensions

Graph Signals (PEG): For data on general graphs, permutation entropy is defined by aggregating local neighborhoods (via adjacency powers), constructing an $m$ -dimensional embedding at each node, and applying ordinal pattern statistics. This generalization retains invariance and reduction to standard PE in linear structures (Fabila-Carrasco et al., 2021, Fabila-Carrasco et al., 2023).
Images/2D Lattices: Extension via rectangular window embeddings applied across pixel grids, with ordinal statistics (Fabila-Carrasco et al., 2021).
Vertex-level Granularity: Graph-based permutation patterns at each node enable region-specific contrasts in fMRI and DTI brain network analysis (Fabila-Carrasco et al., 2023).
Global Permutation Entropy (GPE): Scans all $k$ -tuples in the signal ( $\binom{n}{k}$ subsets), not just consecutive/delayed ones, thus providing a finer permutation profile that detects global structure. Efficient algorithms now allow practical computation up to $k=6$ (Avhale et al., 27 Aug 2025).

Theoretical Generalizations

Z-entropy and Permutation Z-entropy: For fully random processes, classical $H_{PE}$ diverges with increasing $m$ . The permutation Z-entropy, based on group entropies and the Lambert $W$ function, attains extensivity and normalized discriminability for both deterministic and random regimes,

$Z^*_\alpha(n) = \exp[W(R_\alpha(p))] - 1$

where $R_\alpha$ is the Rényi entropy of order $\alpha$ (Amigó et al., 2021, Amigó et al., 2020).

Weighted and Modified PE

Modified PE (mPE): Handles repeated values ("ties") by assigning equal rank/symbol, changing the effective pattern space (Mitroi-Symeonidis et al., 2019).
Weighted PE (WPE): Weights pattern contributions by local amplitude variance to capture not just the ordinal structure but also the signal amplitude variation (Mitroi-Symeonidis et al., 2019).

5. Applications

Permutation entropy is broadly applied in diverse scientific domains.

Dynamics and Bifurcations: PE detects sharp changes in complexity at dynamical bifurcations, outperforming classical indicators (autocorrelation, variance) in identifying early warning signals and nonlinear transitions (e.g., laser threshold) (Gancio et al., 1 Aug 2025).
Biomedical Signal Analysis: PE provides robust markers for sleep staging (decreasing value in deeper sleep) (Edthofer et al., 2023), epileptic state discrimination (Traversaro et al., 2017), and cognitive decline via region-specific complexity (Fabila-Carrasco et al., 2023).
Fire Dynamics: Applied to thermocouple measurements in fire tests, PE reveals temporal structure and turbulent phases (Mitroi-Symeonidis et al., 2019).
Communications: Multi-scale permutation entropy features enable accurate classification of noisy radio-frequency signals, outperforming or matching deep neural network approaches trained on raw data (Kay et al., 2023).
Complexity-Entropy Plane: Joint use of normalized PE and statistical complexity (e.g., Jensen–Shannon disequilibrium) distinguishes stochastic, periodic, and chaotic regimes, as in the analysis of the Hamiltonian Mean-Field model (Fuentealba et al., 23 Oct 2024).
Discriminating Chaos vs. Stochasticity: PE, combined with machine learning models (e.g., neural networks trained on ordinal pattern histograms), allows for robust classification of chaotic versus high-order stochastic time series, exploiting the distributional difference in permutation profiles (Boaretto et al., 2021).

6. Practical Considerations, Limitations, and Best Practices

Data Requirements and Reliability

Sample size $N$ must significantly exceed $m!$ for reliable estimation; practical use is typically restricted to $m=3\ldots7$ , with $N\gg m!$ (Mitroi-Symeonidis et al., 2019).
Undersampled patterns (many zeros) bias entropy estimates downward; Bayesian or bootstrap methods mitigate this (Little et al., 2021, Traversaro et al., 2017).

Ties and Quantization Artifacts

Amplitude quantization (common in digitized signals) may induce a high proportion of ties. Use mPE variants or inject infinitesimal noise if physically justified (Mitroi-Symeonidis et al., 2019).

Parameter Selection

Heuristic or trial-and-error methods for $(m,\tau)$ can produce misleading results; adopt data-driven or frequency-domain approaches for reproducibility (Myers et al., 2019, Myers et al., 2019).
For multiscale or cross-domain comparability, consistent parameterization is essential.

Computational Complexity

Standard PE is computationally efficient: $O(N\,m\,\log m)$ .
GPE and high-order extensions are costlier, though recent advances permit feasible profiling for $k\leq6$ (Avhale et al., 27 Aug 2025).

Strengths and Limitations

Strengths: nonparametric, amplitude-agnostic, efficient, preserves temporal information, robust to noise, widely interpretable.
Limitations: sensitivity to parameter choices, requirement of sufficient data, difficulties associated with ties and nonstationarity, loss of information on absolute amplitude.

7. Theoretical Foundations and Current Research Directions

Connection to Kolmogorov–Sinai entropy: For deterministic dynamical systems, per-symbol PE converges to the KS entropy in the limit of long patterns (Amigó et al., 2020, Amigó et al., 2021).
Complexity Classes and Forbidden Patterns: Classification of processes into exponential, subfactorial, factorial by the growth rate of visible permutations informs application of generalized entropies and helps distinguish deterministic and stochastic behaviors (Amigó et al., 2021).
Multivariate and Graph Extensions: PEG and graph-based permutation patterns enable analysis of data on complex domains such as brain connectivity graphs, sensor networks, and images (Fabila-Carrasco et al., 2021, Fabila-Carrasco et al., 2023).
Open Issues: Theoretical lower bounds for PE with repeated ordinal components, multiscale extensions, inference of effective pattern space dimension, and the design of joint and conditional PE for multidimensional systems remain active areas (Mitroi-Symeonidis et al., 2019, Kay et al., 2023).

Permutation entropy thus occupies a central place in contemporary time-series complexity analysis, continuing to evolve in methodology, theoretical grounding, and domain-specific variants as documented across recent research literature.