Papers
Topics
Authors
Recent
2000 character limit reached

Mutual Information-Based Criterion

Updated 23 November 2025
  • Mutual Information-Based Criterion is a quantitative measure that assesses dependency between random variables, guiding feature selection and dimensionality reduction.
  • It leverages invariance and self-consistency properties to enhance robust estimation and algorithmic performance in statistical and signal processing applications.
  • The criterion underpins practical applications such as stopping rules in decoding, neural representation learning, and secure coding via explicit MI constraints.

A mutual information-based criterion refers to any principle, objective function, or selection rule that leverages mutual information (MI)—a fundamental scalar statistic measuring the dependence between random variables—as a core quantitative assessment. MI-based criteria support both classical and contemporary statistical, signal processing, learning, and information-theoretic systems by providing an operational, tractable formulation for quantifying association, relevance, extractability, or uncertainty reduction in diverse structured environments.

1. Mathematical Foundations of Mutual Information Criteria

Mutual information between random variables XX and YY is defined as

I(X;Y)=p(x,y)logp(x,y)p(x)p(y)dxdyI(X;Y) = \int p(x, y) \log \frac{p(x, y)}{p(x)p(y)} dx\,dy

and quantifies the reduction in uncertainty of one variable given knowledge of the other. It is symmetric (I(X;Y)=I(Y;X)I(X;Y) = I(Y;X)), non-negative, and equals zero if and only if XX and YY are independent.

Formal MI criteria are used to:

  • Maximize relevance between representations and targets, e.g., I(GX;Y)I(GX;Y) in dimension reduction (Razeghi et al., 2019).
  • Minimize redundancy or enforce conditional independence through chain-rule or higher-order MI (Vergara et al., 2015, Venkateswara et al., 2017).
  • Serve as stopping rules or thresholds: stopping iterative decoding when estimated MI exceeds a threshold linked to BER (Wu et al., 2013), or terminating feature addition when residual CMI is small (Yu et al., 2018).
  • Define security in distributed cryptosystems via leakage constraints I(C;X)δI(C;X) \leq \delta (Oohama et al., 17 Jul 2025).

These criteria map directly to operational or information-theoretic guarantees: minimal Bayes error (via Fano's inequality), security leakage, statistical significance, or task-aligned representational fidelity.

2. Equitability, Invariance, and Self-Consistency

A central property of MI-based criteria is self-equitability, tightly linked to the data processing inequality (DPI): for any deterministic function ff, if (Xf(X)Y)(X \to f(X) \to Y) is a Markov chain,

I(X;Y)=I(f(X);Y)I(X;Y) = I(f(X);Y)

meaning that MI is invariant under all invertible transformations and reduction to sufficient statistics (Kinney et al., 2013).

Unlike alternative measures (e.g., the maximal information coefficient, MIC), MI satisfies:

  • Full invariance under arbitrary invertible transforms,
  • The DPI (monotonicity under post-processing of variables),
  • Criterion independence from the particular parametric form of association, guaranteeing conceptual naturality and generality (Kinney et al., 2013).

The MI criterion thus quantifies all statistical dependence and equates relationships of equal noisiness, regardless of functional form, in contrast with grid-based or monotonic-invariant methods.

3. Algorithmic Realizations across Research Domains

Mutual information-based criteria are implemented in numerous algorithmic frameworks across disciplines:

A. Feature Selection

  • Maximal Relevance: Selecting features with maximal I(Xj;Y)I(X_j; Y) (Schnapp et al., 2020, Vergara et al., 2015).
  • mRMR (min-redundancy-maximal-relevance): Maximizing I(Xj;Y)λiSI(Xj;Xi)I(X_j;Y) - \lambda \sum_{i \in S} I(X_j; X_i) (Liu et al., 2022, Vergara et al., 2015).
  • Unique Relevance (BUR): Augmenting relevance with the unique MI I(Xj;YS{Xj})I(X_j; Y | S \setminus \{X_j\}) for redundancy control (Liu et al., 2022).
  • Global subset selection (BQP): Expressing I(XS;Y)I(X_S;Y) as a quadratic form under conditional independence and solving with approximations like TPower and LowRank (Venkateswara et al., 2017).

B. Dimensionality Reduction

  • Greedy subspace selection: Rank directions by I(gX;Y)I(g^\top X; Y), maximize I(GX;Y)I(GX;Y), and construct projections for discriminativity under MI (Razeghi et al., 2019, Ozdenizci et al., 2021).
  • Stochastic MI-gradient neural dimensionality reduction (MMINet): Learn nonlinear mappings maximizing I(ϕ(X);Y)I(\phi(X);Y) end-to-end without distributional assumptions (Ozdenizci et al., 2021).

C. Encoding, Decoding, and Representation Learning

  • Iterative decoder stopping: Monitor MI between bits and LLR output, set direct II thresholds for decoding termination (Wu et al., 2013).
  • Self-supervised learning (SSL): Maximize I(representation1;representation2)I(\text{representation}_1; \text{representation}_2), reducing loss to log-determinant forms under distributional homeomorphism for efficient SSL objectives (Chang et al., 7 Sep 2024).
  • Communication and security: Define reliability or secrecy as explicit MI constraints, e.g., I(ciphertext;plaintext)δI(\text{ciphertext}; \text{plaintext}) \le \delta in distributed encryption (Oohama et al., 17 Jul 2025).
  • Neural decoders via discriminative MI objectives: Train discriminators to realize argmaxxpXY(xy)\arg\max_x p_{X|Y}(x|y), maximizing I(X;Y)I(X;Y) for robust MAP decoding (Tonello et al., 2022).

D. Clustering, Hashing, and Compression

  • Cluster evaluation by average normalized MI (ANMI) with attribute-based references [0511013].
  • Online hashing: Drive updates and function learning by MI between Hamming distances and neighborhood indicators (Cakir et al., 2017).
  • Layerwise neural network pruning: Compute conditional geometric MI between filters for dependency-aware compression (Ganesh et al., 2020).

4. Estimation Methodologies and Practical Challenges

Estimation of MI in high-dimensional, continuous, or complex discrete domains remains a central theme:

  • Histogram-based estimators: Tractable but limited by the curse of dimensionality and sensitive to binning (Papana et al., 2009).
  • k-nearest neighbor (KNN, Kraskov) estimators: Consistency and stability for low to moderate dimensions; kk parameter controls smoothness/bias (Papana et al., 2009, Kinney et al., 2013).
  • Kernel and graph-based estimators: Kernel density for continuous variables; geometric estimators for structural dependencies (Papana et al., 2009, Ganesh et al., 2020).
  • Matrix-based Renyi entropy estimators: Direct RKHS functionals for joint and conditional MI without PDF estimation, scalable for stopping criteria (Yu et al., 2018).
  • Neural/variational estimators: Learnable discriminators for MI or density ratio approximations, especially under unknown or implicit distributions (Tonello et al., 2022, Chang et al., 7 Sep 2024).

Bias-variance tradeoffs, computational costs (bins, neighborhoods, spectral decompositions), convergence under finite samples, and robustness to distributional shifts are recurring concerns (Papana et al., 2009, Yu et al., 2018).

5. Extensions and Alternative Dependence Criteria

While canonical MI is defined via Kullback–Leibler divergence, several generalizations accommodate continuous, heavy-tailed, or privacy-sensitive regimes:

Criterion Divergence/Metric Key Properties
IKL(X;Y)I_{KL}(X;Y) (classical) Kullback–Leibler Unbounded/infinite; sensitive to support mismatches (Kuskonmaz et al., 2022)
IJS(X;Y)I_{JS}(X;Y) (Jensen–Shannon) Jensen–Shannon Symmetric, bounded [0, log 2]; metric under \sqrt{\cdot} (Kuskonmaz et al., 2022)
ITV(X;Y)I_{TV}(X;Y) (Total Variation) TV Distance True metric; coarse for small differences (Kuskonmaz et al., 2022)
IW(X;Y)I_W(X;Y) (Wasserstein–MI) Wasserstein Geometric, robust; higher computational cost (Kuskonmaz et al., 2022)

These alternatives are used when classical MI is ill-posed or numerically unstable, and admit plug-in, kNN, kernel, or Sinkhorn estimators depending on sample size and dimension (Kuskonmaz et al., 2022).

6. Theoretical Impact and Open Issues

MI-based criteria have foundational significance:

7. Empirical Performance and Synthesis

Across domains, MI-based criteria achieve:

Their operational success is grounded in rigorous invariance, self-consistency, and direct empirical links to task objectives.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Mutual Information-Based Criterion.