Permutation Entropy in Time Series

Updated 27 November 2025

Permutation Entropy is a complexity measure that quantifies the diversity and structure of ordinal patterns in time series, independent of amplitude.
It employs a computationally efficient sliding-window methodology, crucial for analyzing nonlinear, stochastic, and multichannel signals.
Extensions such as weighted, multivariate, and global variants expand its applications in neuroscience, engineering, and climate studies by capturing multi-scale dynamics.

Permutation entropy (PE) is a complexity measure for time series and multidimensional data based on the probability distribution of ordinal patterns (permutations) extracted from subsequences of the data. Originally introduced by Bandt and Pompe (2002), PE quantifies the diversity and structure of temporal or spatial ordering without reference to amplitude, making it robust, computationally efficient, and broadly applicable to nonlinear, stochastic, and multichannel signals (Pessa et al., 2021). Over two decades, PE has spawned a range of methodological advances—including graph and multivariate extensions, weighted generalizations, multiscale analysis, and complexity–entropy planes—that have been widely adopted in data science, neuroscience, physics, and engineering.

1. Mathematical Definition and Algorithmic Procedure

Permutation entropy is defined for a real-valued sequence $\{x_t\}_{t=1}^{N}$ by considering the distribution of ordinal patterns formed from embedding vectors: $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ where $d \geq 2$ is the embedding dimension (pattern length), and $\tau \geq 1$ is the delay. Each $w_p$ is mapped to a permutation $\pi_p$ of $\{0,1,\ldots,d-1\}$ that describes the ascending rank order of its components (ties are broken by time index). The empirical frequency $p(\Pi_i)$ of each of the $d!$ possible patterns $\Pi_i$ is counted over all windows (Pessa et al., 2021, Kay et al., 2023).

The (Shannon) permutation entropy is

$w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 0

with normalized form

$w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 1

Algorithmically, PE is computed by scanning the series with a sliding window (possibly on a windowed data segment), extracting ordinal patterns, tallying frequencies, and calculating the entropy of the resulting distribution. The effective sample size must greatly exceed $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 2 for statistical reliability (Pessa et al., 2021).

2. Parameter Selection: Embedding Dimension and Delay

Proper selection of embedding dimension $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 3 and delay $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 4 is critical. Several automated methods for $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 5 selection are available:

Frequency-domain: Uses spectral estimates to set $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 6 based on the highest significant frequency via least-median-of-squares regression on the Fourier modulus (Myers et al., 2019).
Permutation Auto-Mutual Information (PAMI): Finds the first minimum of the auto-mutual information of the ordinal pattern process (Myers et al., 2019).
Multiscale PE (MPE): Examines $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 7 as a function of $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 8, choosing $w_p = (x_p, x_{p+\tau}, \ldots, x_{p+(d-1)\tau}),\quad p = 1, \ldots, N-(d-1)\tau,$ 9 at key maxima or resonance (Myers et al., 2019).
Topological Data Analysis (TDA): Applies persistent homology to point clouds formed from embedding vectors to select $d \geq 2$ 0 maximizing topological signatures (Myers et al., 2019).

The embedding dimension $d \geq 2$ 1 can be chosen by:

False Nearest Neighbours (FNN): The smallest $d \geq 2$ 2 at which the fraction of false neighbours drops below a threshold (Myers et al., 2019).
Permutation entropy maximization: Plotting normalized $d \geq 2$ 3 and selecting $d \geq 2$ 4 at the peak (Myers et al., 2019).

Practical guidelines recommend $d \geq 2$ 5 and $d \geq 2$ 6 as defaults, with data-driven checks for system-specific tuning (Myers et al., 2019, Myers et al., 2019).

3. Extensions: Multivariate, Weighted, Graph, and Global PE

Weighted and Generalized Weighted PE

Weighted permutation entropy (WPE) enhances classical PE by weighting patterns according to fluctuation amplitude (e.g., local variance in the window) (Stosic et al., 2022). The generalized weighted version introduces a scaling exponent $d \geq 2$ 7,

$d \geq 2$ 8

allowing continuous emphasis on large or small fluctuations. $d \geq 2$ 9 recovers standard PE; $\tau \geq 1$ 0 is WPE. Scanning $\tau \geq 1$ 1 yields the complexity–entropy–scale causality box (Stosic et al., 2022).

Multivariate and Graph Extensions

Multivariate PE encodes both temporal structure and cross-channel dependencies by building a graph whose vertices encapsulate all time-channel pairs and applying permutation entropy over graph-neighbor-based embeddings (Fabila-Carrasco et al., 2022). This approach outperforms marginal and channel-wise PEs by capturing true multichannel interactions.

Graph-based PE, and its continuous variants, extend pattern extraction to generic, possibly non-regular, domains such as sensor networks or fMRI brain graphs. Local neighborhood-averages replace time-delay embedding, and ordinal activations enable amplitude-sensitive nonlinearities (Fabila-Carrasco et al., 2021, Roy et al., 2024). The “ordinal deep learning” paradigm uses monotonic nonlinearities and trainable statistics for further flexibility (Roy et al., 2024).

Global Permutation Entropy (GPE)

Global permutation entropy computes the distribution of all possible subsequences of $\tau \geq 1$ 2 (not necessarily consecutive) values,

$\tau \geq 1$ 3

which provides sensitivity to long-range and multi-scale dependencies. This requires advanced combinatorial counting algorithms but converges more rapidly and requires less parameter tuning than classical PE (Avhale et al., 27 Aug 2025).

4. Statistical Complexity and the Complexity–Entropy Plane

Permutation entropy is frequently paired with statistical complexity measures, especially the Jensen–Shannon complexity $\tau \geq 1$ 4: $\tau \geq 1$ 5 where $\tau \geq 1$ 6 is the observed pattern distribution and $\tau \geq 1$ 7 is uniform. Plotting $\tau \geq 1$ 8 in the complexity–entropy plane locates a process on the continuum from order (periodic) through deterministic chaos to stochasticity (Fuentealba et al., 2024, Pessa et al., 2021, Kilpua et al., 2024, Pessa et al., 2021, Pessa et al., 2021, Pessa et al., 2021). This analysis distinguishes noise (high entropy, low complexity), periodicity (low entropy, low complexity), and chaos/intermittency (intermediate entropy, high complexity).

Extensions such as Tsallis or Rényi PE allow construction of families of complexity–entropy curves, each parameterized by the corresponding entropy index, further refining dynamical regime discrimination (Pessa et al., 2021).

5. Theoretical Properties and Generalizations

Invariance: PE is invariant under strictly monotonic transformations and amplitude rescalings (Pessa et al., 2021, Kay et al., 2023). Thus, it is robust to many forms of observational noise, but does not detect amplitude modulation.
Relation to Dynamical Invariants: For deterministic, piecewise-monotone maps, the permutation-entropy rate converges (with increasing pattern length) to the Kolmogorov–Sinai entropy (metric entropy) (Amigó et al., 2020, Watt et al., 2018).
Complexity Classes: The diversity of observed patterns falls into exponential, sub-factorial, or factorial growth regimes, separating deterministic systems (sub-linear pattern growth) from stochastic processes (full factorial) (Amigó et al., 2021).
Normalization and Divergence: In white noise, $\tau \geq 1$ 9 as all patterns appear equally likely. For random processes, “classical” PE's per-symbol entropy diverges; generalized permutation entropies (e.g., $w_p$ 0-entropy) maintain finite normalization, sharply distinguishing disorder from deterministic chaos (Amigó et al., 2020, Amigó et al., 2021).

6. Practical Applications Across Domains

Permutation entropy and its variants have been widely adopted:

Neuroscience: Quantification of EEG complexity and sleep stage transitions; diagnosis of insomnia and tracking of anesthesia depth (Edthofer et al., 2023).
Engineering: Fault detection in electromechanical systems such as wind turbines via complexity differentials across phases (Ram et al., 2016).
Communications: Radio-frequency signal classification under noise by exploiting the amplitude-agnostic, structure-sensitive nature of PE-derived features (Kay et al., 2023).
Space Physics: Characterization of solar wind regimes by complexity-entropy analysis, discriminating turbulence from coherent structures (Kilpua et al., 2024).
Climate and Environmental Data: Detection of local mixing and instrument oversampling scales via reversal metrics applied to the delay parameter, enabling time-scale aware signal retention and analysis (Neuder et al., 2020).

A strong trend is the use of PE in conjunction with the complexity–entropy plane for regime classification, multi-scale/multivariate structure detection, and time-varying complexity monitoring.

7. Available Software and Computational Remarks

Numerous efficient implementations now exist. The ordpy Python package is a comprehensive toolbox for PE, generalized entropies, complexity measures, and ordinal networks applicable to time series and two-dimensional images (Pessa et al., 2021). Julia-based and other high-performance libraries are available for global permutation entropy and for advanced graph and multivariate extensions (Avhale et al., 27 Aug 2025).

Computational cost is dominated by the sorting required for each length- $w_p$ 1 window ( $w_p$ 2); factorial growth in the number of patterns restricts feasible dimensions to $w_p$ 3 unless massive sample sizes are available (Pessa et al., 2021). Advanced graph and global algorithms exploit combinatorial data structures to remain tractable for practical problems (Avhale et al., 27 Aug 2025, Fabila-Carrasco et al., 2022).

Permutation entropy and its modern extensions comprise a versatile, theoretically rigorous, and application-proven framework for symbolic time series analysis, complexity quantification, and nonlinear signal discrimination across physics, engineering, and neuroscience. Continuous advances in multiscale, multivariate, graph-based, and information-theoretic methodologies have made PE central to contemporary nonlinear data analytics (Pessa et al., 2021, Stosic et al., 2022, Fabila-Carrasco et al., 2022, Amigó et al., 2021, Fuentealba et al., 2024, Neuder et al., 2020, Kilpua et al., 2024, Avhale et al., 27 Aug 2025).