Multidimensional Multi-scale Entropy (MMSE)

Updated 6 April 2026

Multidimensional Multi-scale Entropy (MMSE) is an information-theoretic metric that extends single-dimensional entropy to assess joint complexity across multiple scales and channels.
It employs normalization, coarse-graining, embedding, and Chebyshev norm match counting to produce a robust entropy scalar that mitigates noise and captures cross-metric interactions.
MMSE finds practical use in software reliability, physiological signal analysis, and emotion recognition, providing actionable insights into system aging and regime shifts.

Multidimensional Multi-scale Entropy (MMSE) is an information-theoretic metric developed to quantify the joint temporal complexity of multidimensional time series over multiple scales. Originally introduced as an indicator of software aging, MMSE extends classical single-dimensional sample entropy to simultaneously handle multivariate signals and their scale-dependent irregularities. The core motivation is to construct robust, noise-tolerant indicators of system state or complexity that account for cross-metric interactions and are stable across a range of real-world fluctuation regimes. MMSE is widely applicable in domains where multichannel or multimodal observations are fundamental, including software reliability engineering, physiological signal analysis, and emotion recognition.

1. Mathematical Formulation and Algorithmic Definition

Let $X \in \mathbb{R}^{N \times p}$ denote a window of $p$ concurrent time-series of length $N$ . The MMSE computation proceeds as follows (Chen et al., 2015, Xiao et al., 2021, Tung et al., 2018):

Normalization: Each metric (column) is linearly rescaled to the interval $[0,1]$ :

$X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$

for $j = 1 \ldots N$ , $i = 1 \ldots p$ .

Similarity Tolerance ( $r$ ): Compute the $p \times p$ covariance $\Sigma = \mathrm{Cov}(X')$ , and set $p$ 0 as the matching tolerance for high-dimensional comparisons.
Coarse-graining: For each scale $p$ 1, the series is coarse-grained:

$p$ 2

for $p$ 3, $p$ 4.

Embedding: For a fixed embedding dimension $p$ 5, form overlapping vectors

$p$ 6

for $p$ 7.

Counting Matches: Using the $p$ 8 norm, compute

$p$ 9

and average:

$N$ 0

Sample Entropy per Scale:

$N$ 1

MMSE Scalar (Composed Entropy):

$N$ 2

This pipeline yields a single scalar per window that reflects the overall multiscale, multidimensional entropy of the input window.

The necessity for “multidimensional” and “multi-scale” features arises from empirical limitations of single-metric or single-scale entropy approaches. Single-channel entropy may miss complex cross-metric dependencies, while traditional entropy at a single timescale is sensitive to high-frequency noise or transient patterns. By integrating over coarse-grained scales $N$ 3, MMSE filters noise and captures slow, global trends. By embedding all $N$ 4 metrics as state-vectors, MMSE probes joint state irregularity, thus being highly sensitive to subtle system-wide changes due to aging, coordination loss, or multi-source heterogeneity (Chen et al., 2015, Xiao et al., 2021).

MMSE generalizes univariate multi-scale entropy (MSE) and multivariate sample entropy (MSampEn), as confirmed by parallel definitions in multichannel physiological settings (Xiao et al., 2021, Tung et al., 2018). The distance measure (Chebyshev/ $N$ 5 norm) and tolerance $N$ 6 ensure robust detection of high-dimensional divergence.

3. Algorithmic Implementation and Computational Complexity

A high-level pseudocode, organizing the main steps, is summarized below (aligning with (Chen et al., 2015, Xiao et al., 2021, Tung et al., 2018)):

$r$ 0

The dominant computational cost is $N$ 7 for full match counting at each scale, as each embedded vector is compared pairwise within its window. Coarse-graining is $N$ 8. For $N$ 9, $[0,1]$ 0, and $[0,1]$ 1, this results in approximately $[0,1]$ 2 distance computations per window (Chen et al., 2015, Xiao et al., 2021), motivating optimizations for large datasets (e.g., windowing, indexing, or down-sampling).

4. Theoretical Properties

Three core properties are established and theoretically justified (Chen et al., 2015):

Monotonicity: Under monotonically increasing failure probability $[0,1]$ 3, the entropy proxy $[0,1]$ 4 grows monotonically while $[0,1]$ 5. MMSE empirically inherits such monotonicity under mild stationarity.
Stability: Summing sample entropy squared across scales (Euclidean norm) filters out local spikes and increases robustness against transient or high-frequency noise.
Integration: Embedding all $[0,1]$ 6 metrics into a joint vector space ensures sensitivity to cross-channel interactions and hidden joint irregularities.

These properties distinguish MMSE as an indicator with provable discrimination capability vis-à-vis system health and aging, justifying its use in both anomaly detection and complexity quantification.

5. Tuning, Parameter Selection, and Practical Guidelines

Key parameters and practical recommendations synthesized from empirical studies (Chen et al., 2015, Xiao et al., 2021, Tung et al., 2018):

Embedding dimension ( $[0,1]$ 7): $[0,1]$ 8 or $[0,1]$ 9 suffices for most engineering and physiological time series.
Number of scales ( $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 0): $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 1– $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 2 is common, with window length $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 3– $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 4; for separation across physiological regimes, $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 5 up to $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 6– $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 7 can be used if $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 8 is large.
Tolerance ( $X'_{j,i} = \frac{X_{j,i} - \min_k X_{k,i}}{\max_k X_{k,i} - \min_k X_{k,i}}$ 9): For software metrics, set $j = 1 \ldots N$ 0; for physiological data, $j = 1 \ldots N$ 1– $j = 1 \ldots N$ 2 for stable separation.
Metric selection ( $j = 1 \ldots N$ 3): In software, reduce 70+ counters to $j = 1 \ldots N$ 4– $j = 1 \ldots N$ 5 by PCA and variable selection; in physiological signals, group multichannel data into functional regions.
Window size ( $j = 1 \ldots N$ 6): $j = 1 \ldots N$ 7 is effective for engineering time series; as little as $j = 1 \ldots N$ 8– $j = 1 \ldots N$ 9 suffices for simple AR models with short embedding.

Parameter sensitivity is dominated by the requirements that $i = 1 \ldots p$ 0, and for high embedding dimensions or many channels, either high data lengths or channel-selection methods are necessary to avoid estimator degeneracy.

6. Comparative Evaluation and Empirical Performance

MMSE has been extensively compared to scalar multi-scale entropy (MSE) and to more recent generalizations such as Variational Embedding Multiscale Sample Entropy (VEMSE) (Xiao et al., 2021):

Synthetic signals: MMSE separates autoregressive (AR) model classes for $i = 1 \ldots p$ 1 ( $i = 1 \ldots p$ 2); VEMSE achieves similar separation at shorter $i = 1 \ldots p$ 3 and higher $i = 1 \ldots p$ 4.
Noise robustness: MMSE performance degrades under strong noise, but to a lesser extent than scalar MSE.
Computational efficiency: MMSE requires quadratic time in window length per scale, while VEMSE is 20–50% faster in benchmarks.
Real-world data: In software aging detection (Helix-Server, AntVision), MMSE in the CHAOS framework delivered 5-fold higher detection precision and 3 orders of magnitude improvement in ahead-time-to-failure vs. single-metric or Hölder-based alternatives (Chen et al., 2015).
Physiological analysis: In wind and heart rate data, MMSE correctly ordered regime complexity, although with overlapping error bars at large scales. For EEG emotion recognition, MMSE features did not achieve statistical significance for arousal or valence separation, whereas permutation-entropy variants (e.g., MMPE) did (Tung et al., 2018).

A synthesized comparison is given below:

Domain / Task	MMSE Efficacy	Quantitative Notes
Software Aging (Chen et al., 2015)	High	5× precision, near-zero ATTF improvement
Wind/Physio (Xiao et al., 2021)	Moderate	Good regime ordering, error bar overlap
Emotion-EEG (Tung et al., 2018)	Not significant	MMSE features not selected (p > 0.1); other entropy variants outperformed

7. Extensions, Variants, and Domain-Specific Adaptations

The literature presents multiple extensions of MMSE to address limitations in short data settings, high embedding dimensions, and channel heterogeneity:

Variational Embedding MMSE (VEMSE): Assigns varying embedding dimensions per channel, improving effectiveness for small $i = 1 \ldots p$ 5, high $i = 1 \ldots p$ 6, mixed signal quality, or significant data heterogeneity (Xiao et al., 2021).
Composite/Refined Coarse-Graining: Averaging across block offsets to avoid undefined logs and increase estimator stability in nonstationary signals (Tung et al., 2018).
Parameter Adaptations: Channel grouping and PCA-driven selection are standard to manage computational and estimator complexity, especially for $i = 1 \ldots p$ 7.

A plausible implication is that, while MMSE is robust for moderate $i = 1 \ldots p$ 8 and $i = 1 \ldots p$ 9, cutting-edge research focuses on reducing its sensitivity to curse-of-dimensionality effects and adapting embedding choices dynamically per channel or signal condition.

In summary, Multidimensional Multi-scale Entropy provides a proven, theoretically sound, and practically tunable framework for measuring joint multivariate complexity over multiple scales. Its adoption in critical software monitoring, physiological regime analysis, and hybrid multimodal recognition tasks demonstrates both broad flexibility and the importance of careful parameterization and empirical validation (Chen et al., 2015, Xiao et al., 2021, Tung et al., 2018).