Papers
Topics
Authors
Recent
Search
2000 character limit reached

Entropy-Based Criteria Overview

Updated 26 June 2026
  • Entropy-Based Criteria are information-theoretic measures that quantify uncertainty using metrics like Shannon, Rényi, and Tsallis entropy.
  • They provide robust frameworks for multi-criteria decision making, model selection, and system characterization across physics, machine learning, and finance.
  • Their practical applications include optimizing neural network pruning, quantum entanglement detection, and risk assessment in financial models.

Entropy-based criteria are a diverse class of information-theoretic measures applied across statistical physics, machine learning, optimization, signal processing, dynamical systems, and quantum information science. They exploit the foundational concept of entropy as a quantifier of uncertainty, diversity, or information content to define selection, evaluation, or admissibility rules that are robust to noise, model uncertainty, or nonlinear interactions. This article surveys the principal entropy-based criteria, rigorous mathematical formulations, and the breadth of technical domains where such measures establish benchmarks for inference, decision-making, detection, or system characterization.

1. Core Definitions and Mathematical Formalism

Entropy quantifies the spread or uncertainty of a probability distribution or dataset. The central variants are:

  • Shannon Entropy: For a discrete vector p=(p1,,pn)p = (p_1,\dots,p_n),

H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i

For continuous densities f(x)f(x), the differential entropy is H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx.

  • Rényi Entropy: For order α>0,α1\alpha > 0, \alpha \neq 1,

Sα(p)=11αlog(i=1npiα)S_\alpha(p) = \frac{1}{1-\alpha} \log \left(\sum_{i=1}^n p_i^\alpha\right)

As α1\alpha \rightarrow 1, S1(p)=H(p)S_1(p) = H(p); as α\alpha\rightarrow\infty, S(p)=logmaxpiS_{\infty}(p) = -\log \max p_i.

  • Tsallis Entropy: For order H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i0,

H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i1

This forms the basis for generalized impurity measures.

Entropy-based criteria typically involve inequalities, optimization objectives, or information gain calculations based on these functionals. Their specificity depends on the problem structure—be it weighing in multi-criteria optimization, pruning in neural networks, entanglement detection in quantum systems, or regularization in statistical inference.

2. Entropy-Based Criteria in Multi-Criteria Decision Making

In multi-criteria decision analysis (MCDA), entropy quantifies the informativeness or discrimination power of each criterion. The Shannon Entropy Weighting Method (EWM) constructs a normalized H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i2 decision matrix H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i3 to compute probabilities H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i4 and entropies H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i5 per criterion, leading to normalized weights H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i6: H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i7 where H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i8, H(p)=i=1npilogpiH(p) = -\sum_{i=1}^n p_i \log p_i9. A parallel approach is the Dispersion-based Weighting Method (DWM), which employs the coefficient of variation f(x)f(x)0 (ratio of standard deviation to mean). DWM weights are

f(x)f(x)1

dispensing with normalization or non-negativity constraints present in EWM. Statistical tests on case studies show nearly identical rankings from both methods (Pearson correlation f(x)f(x)2), with DWM offering minimal computational burden and direct applicability to negative data (Babaei et al., 28 Apr 2025).

Another entropy-based approach evaluates decision consistency in pairwise comparison matrices (PCMs). The entropy production rate f(x)f(x)3, constructed from the stationary distribution f(x)f(x)4 and transition probabilities f(x)f(x)5 of a Markov chain induced by the PCM, detects inconsistency via

f(x)f(x)6

with f(x)f(x)7 characterizing perfectly consistent PCMs (Dixit, 2018).

3. Entropy Criteria in Statistical Learning and Model Selection

Entropy’s role in machine learning is exemplified in decision-tree induction, where classical split criteria (Shannon entropy for ID3, Gini index for CART) are unified by Tsallis entropy: f(x)f(x)8 with f(x)f(x)9 yielding Shannon entropy, and H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx0 yielding the Gini impurity. The Tsallis Entropy Criterion (TEC) leverages this functional as a tunable, flexible split impurity and outperforms the classical criteria by grid-searching H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx1 and selecting via cross-validation (Wang et al., 2015).

In model-based RL, Maximum Entropy Model Rollouts (MEMR) maximize the entropy of state-action pairs in imaginary rollouts, systematically steering the agent to unexplored, informative regions via sampling priorities

H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx2

to minimize compounding model error and improve policy sample efficiency (Zhang et al., 2020).

Compressing CNNs via entropy-based pruning retains only filters with high-entropy activation distributions, directly targeting filters contributing maximal information diversity. Iterative pruning by Shannon entropy followed by staged retraining yields state-of-the-art compression and runtime reduction (Luo et al., 2017).

In the domain of latent variable models, the entropy of the encoding means H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx3 across a dataset serves as a regime-detection criterion: dimensions with high entropy are classified as active, low entropy as passive. This criterion is theoretically linked to the KL divergence for VAEs via Shannon’s entropy-variance bounds and is practically employed via thresholding on the empirical entropy histogram to recover polarization in H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx4-VAEs, LV-AEs, and iVAEs (Clapham et al., 15 May 2026).

4. Entropy-Based Criteria in Quantum Information and Dynamical Systems

Quantum entanglement detection relies on entropy monotonicity under partial trace. For a bipartite state H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx5,

H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx6

for all H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx7. Violation signals entanglement. Yet, numerical studies show such Rényi entropy criteria are extremely weak for typical quantum states in all but the smallest Hilbert space dimensions, with the positive partial transpose (PPT) criterion being exponentially stricter (Sauer et al., 2022). For continuous-variable systems, Rényi- and Tsallis-based entropic witnesses surpass second-order moment inequalities in sensitivity, especially for strongly non-Gaussian states (Saboia et al., 2010). For fermionic systems, majorization and Rényi entropy relations between a state and its one-particle reduced density matrices anchor a family of necessary conditions, whose strength increases with the Rényi index, saturating (for H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx8) the optimal threshold of entanglement in relevant state families (Zander et al., 2011).

In ergodic theory and dynamical systems, entropy-based criteria control the density of ergodic measures or the genericity of statistical properties. The equivalence between the density of the entropy graph on ergodic states, the Gâteaux differentiability of the pressure function on a dense subspace, and the strict convexity of finite-dimensional Legendre–Fenchel transforms provides a comprehensive analytic–probabilistic bridge (Comman, 2015).

5. Entropy-Based Risk Measures and Inverse Problems

In financial mathematics and stochastic optimization, Entropic Value-at-Risk (EVaR) and its Rényi-entropy generalizations subsume classical risk measures under an explicit entropy constraint on the dual probability density: H(f)=f(x)logf(x)dxH(f) = -\int f(x)\log f(x) dx9 where α>0,α1\alpha > 0, \alpha \neq 10 is the Rényi entropy dual. This family nests Average Value-at-Risk (AVaR, α>0,α1\alpha > 0, \alpha \neq 11) and essential supremum (α>0,α1\alpha > 0, \alpha \neq 12), interpolating risk tolerance via the allowed information divergence from a reference model (Pichler et al., 2018).

In portfolio optimization, continuous entropy α>0,α1\alpha > 0, \alpha \neq 13 serves as a model-free, non-moment-based risk measure, matching or exceeding the explanatory power and predictive stability of standard deviation or CAPM beta over multidecade data (Ormos et al., 2015).

For inverse problems, entropy-based regularization enforces diversity among solution coefficients to counteract overfitting. Pareto weighting or scalarization of terms such as generalized collage distance, negative entropy, and sparsity enables systematic navigation of the accuracy-diversity-simplicity trade-off (Kunze et al., 2019).

Relative entropy (Kullback–Leibler divergence) as an information criterion enables simultaneous selection of linear system delay, order, and noise variance by minimizing the divergence between the observed and candidate model distribution, with efficient batch and recursive algorithms and theoretically justified stopping rules (Shamsi et al., 2022).

6. Model Selection, Independence Testing, and Information Criteria

In quantum state estimation, quantum relative entropy as a rate function enables the construction of Akaike- and Watanabe-type model selection criteria, paralleling classical information-theoretic approaches and inheriting asymptotic unbiasedness and large-deviation accuracy guarantees (Okamura, 2012).

Entropy-regularized optimal transport independence criteria serve as robust, scalable alternatives to kernel and energy-based dependence tests. The Sinkhorn divergence between empirical joint and product measures achieves controlled statistical power, admits random-feature acceleration, and is fully differentiable for integration into deep learning pipelines (Liu et al., 2021).

Rényi entropy-based absolute criteria efficiently characterize families of functional solutions (e.g., parton distribution functions) in high-dimensional inverse problems by mapping them to entropy vectors and extracting maximally diverse representatives with Pareto-front analysis, dispensing with the combinatorial overhead of metric-based clustering (Courtoy et al., 10 Nov 2025).

7. Entropy as a Criterion for Self-Organization

In open, nonequilibrium systems, bounds on normalized Shannon entropy relative to Renyi entropy define the emergence of self-organization: α>0,α1\alpha > 0, \alpha \neq 14 where α>0,α1\alpha > 0, \alpha \neq 15 and α>0,α1\alpha > 0, \alpha \neq 16 is the Renyi entropy of order α>0,α1\alpha > 0, \alpha \neq 17. Empirical validation on hierarchical fractal structures confirms these theoretical thresholds as universal markers of self-organized, scale-invariant complexity (Zhanabaev et al., 2016).


In summary, entropy-based criteria provide a flexible, information-theoretic toolkit for quantifying diversity, uncertainty, or structure in models, systems, and data. Their mathematical tractability, foundational justification, and broad adaptability undergird their pivotal role in multi-criteria optimization, machine learning, quantum information, risk theory, and beyond. Current research continually extends their reach, tuning sensitivity via entropy parameters, decomposing complex systems via entropy-based partitions, and harnessing efficient computational paradigms for large-scale applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Entropy-Based Criteria.