Normalized Mutual Information (NMI)

Updated 23 November 2025

Normalized Mutual Information (NMI) is a normalized metric that quantifies the dependence between variables by scaling mutual information with entropy-based functions.
It is widely used in clustering evaluation, feature selection, multimodal fusion, and financial timeseries analysis to assess partition agreement and detect regime shifts.
Practical implementations of NMI use various normalization strategies, with bias corrections like AMI and rNMI enhancing its robustness across diverse domains.

Normalized Mutual Information (NMI) is a bounded, unitless measure of statistical dependence between two random variables or partitions, widely adopted for evaluating feature relevance, measuring clustering agreement, guiding fusion in multimodal systems, quantifying associations in high-dimensional data, and diagnosing structural shifts in complex stochastic systems. The core principle underlying NMI is the normalization of mutual information—a general symmetric measure of variable association—by a function of the marginal entropies, yielding a dimensionless score in $[0,1]$ that can be meaningfully interpreted and compared across diverse domains and problem sizes.

1. Mathematical Definition and Normalization Strategies

Let $X$ and $Y$ denote random variables (discrete or continuous, possibly multidimensional) with joint and marginal probability densities $p(x,y)$ , $p(x)$ , and $p(y)$ . The mutual information,

$I(X;Y) = \int p(x,y) \log \frac{p(x,y)}{p(x)\,p(y)}\,dx\,dy = H(X) + H(Y) - H(X, Y),$

measures the expected reduction in uncertainty (entropy) of $X$ given knowledge of $Y$ (and vice versa), but is unbounded and directly tied to the entropy scales of the marginals.

To map $I(X;Y)$ into a universal scale, NMI normalizes by a symmetric function of $H(X)$ and $H(Y)$ . Dominant variants include:

Arithmetic-mean ("sum") normalization:

$\mathrm{NMI}_{2}(X, Y) = \frac{2I(X; Y)}{H(X) + H(Y)}$

Geometric-mean normalization:

$\mathrm{NMI}_{\sqrt{}}(X, Y) = \frac{I(X; Y)}{\sqrt{H(X)\,H(Y)}}$

Max normalization:

$\mathrm{NMI}_{\max}(X, Y) = \frac{I(X; Y)}{\max\{H(X), H(Y)\}}$

Joint-entropy normalization (less common):

$\mathrm{NMI}_{\mathrm{joint}}(X, Y) = \frac{I(X; Y)}{H(X, Y)}$

All these ensure $0 \leq \mathrm{NMI} \leq 1$ . The geometric mean and arithmetic mean forms are especially prevalent for evaluating clustering, feature selection, and in information-theoretic modeling due to their symmetry and interpretability (Alonso, 20 Nov 2025, Jerdee et al., 2023, McDaid et al., 2011).

2. Theoretical Properties and Interpretation

The key theoretical properties, thoroughly derived in (Alonso, 20 Nov 2025, Nagel et al., 8 May 2024), are as follows:

Identity of indiscernibles: $\mathrm{NMI}(X, Y) = 1$ if and only if $X$ and $Y$ are perfectly coupled (bijective, noiseless mapping).
Zero baseline: $\mathrm{NMI}(X, Y) = 0$ if and only if $X$ and $Y$ are statistically independent.
Symmetry: $\mathrm{NMI}(X, Y) = \mathrm{NMI}(Y, X)$ , by construction.
Boundedness: For the most common normalizations, $0 \leq \mathrm{NMI}(X, Y) \leq 1$ for all $X, Y$ with well-defined (finite) entropy.
Invariance: For continuous variables, mutual information is invariant under smooth invertible transformations, but NMI is mildly sensitive to scaling in the marginals due to entropy shifts. However, for large marginals, these effects remain minor (Alonso, 20 Nov 2025, Nagel et al., 8 May 2024).

Importantly, NMI provides a scale-free summary of association strength and is robust to marginal entropy disparities, making it valuable for comparing variable relationships across heterogeneous systems.

3. Algorithmic Estimation and High-Dimensional Implementations

Discrete Data: For discrete variables, NMI is estimated via plug-in frequency histograms. Given empirical joint counts $n_{ij}$ , the entropies and MI are computed by direct substitution into the formulas above (Sarhrouni et al., 2022, Jiang et al., 2020).

Continuous/High-Dimensional Data: For continuous and especially multidimensional variables, k-nearest-neighbor estimators—most notably the Kraskov-Stögbauer-Grassberger (KSG) approach—are the gold standard. The practical estimator used by Nagel et al. (Nagel et al., 8 May 2024) and validated in (Tuononen et al., 10 Oct 2024) involves:

Computing marginal and joint kNN radii.
Estimating relative (coordinate-invariant) differential entropies.
Forming the NMI by dividing the kNN MI estimate by the geometric mean of relative marginal entropies, with care to maintain numerical stability (in high dimensions, normalization of the radii is done in the log domain) (Tuononen et al., 10 Oct 2024).

Empirical Guidelines: For practical computation:

Use double precision, tune $k$ to balance bias and variance ( $5\leq k\leq 10$ is typical).
For large $d$ , apply log-sum-exp tricks to prevent numerical overflow (Tuononen et al., 10 Oct 2024).
For imaging, spectral, and high-throughput data, histogram discretization with appropriate binning allows robust NMI estimation and downstream filtering (Sarhrouni et al., 2022).

4. Applications Across Domains

4.1. Community Detection and Clustering Evaluation

NMI has become the standard to evaluate the agreement between algorithmic or discovered partitions and ground-truth labels in network/community detection. The metric is robust to label permutations, accommodates differing cluster sizes, and is applicable in both hard and overlapping community settings (McDaid et al., 2011, Labatut, 2013, Zhang, 2015, McCarthy et al., 2019). Typical usage follows:

Compute empirical contingency tables for partition overlaps.
Calculate NMI via arithmetic mean (sum) normalization.
Adjust for finite-size or sampling bias via rNMI or adjusted mutual information, as detailed in (Zhang, 2015, McCarthy et al., 2019).

4.2. Feature Selection and Redundancy Control

In hyperspectral imaging and high-dimensional feature selection, NMI offers a direct metric for ranking features by relevance to a target and for managing redundancy among candidates (Nhaila et al., 2022, Sarhrouni et al., 2022). The typical workflow involves:

Compute NMI between each candidate feature and the target for relevance selection.
In a subsequent pass, control redundancy by minimizing pairwise NMI among selected features, often with a user-specified threshold.
For wrapper methods, pair the NMI-based selection with empirical error probabilities (Fano bounds) to optimize both accuracy and parsimony.

4.3. Multimodal and Uncertainty-Aware Systems

Recent advances in uncertainty quantification and multimodal fusion leverage batch-wise NMI scores between latent representations (e.g., between camera and LiDAR features in a 3D object detector) as dynamic modulators of calibration losses (Stutts et al., 2023). Here, NMI computed between encoder outputs drives the trade-off between interval sharpness and coverage in conformal prediction, with empirical results demonstrating inverse correlation between predictive uncertainty and NMI throughout model training.

4.4. Registration, Localization, and Image Matching

In vision and signal processing, NMI (frequently with geometric mean normalization) underpins similarity measures for registration, localization, and matching under varying illumination and noise (Kunde et al., 20 Dec 2024). Enhanced NMI (ENMI) variants explicitly incorporate noise models into joint histograms, yielding improved performance when noise properties are non-uniform across spatial domains.

4.5. Financial and Statistical Time Series Analysis

Recent work establishes NMI as a leading indicator for regime change and serial dependence in financial time series (Alonso, 20 Nov 2025). Its boundedness, interpretability, and robustness to volatility scaling provide a superior alternative to autocorrelation for market efficiency testing, risk monitoring, and detection of structural breaks.

5. Interpretive Cautions and Bias Corrections

Though widely adopted, standard NMI exhibits notable pathologies:

Finite-size and high-resolution bias: In network analysis, NMI is sensitive to cluster counts and finite-sample fluctuations; it spuriously favors over-partitioned clusterings (“singleton” or trivially fine partitions score well even when uninformative) (Zhang, 2015).
Symmetric normalization induced bias: Standard NMI formulae can introduce dependence on algorithm output complexity, shifting rankings of methods irrespective of ground-truth similarity (Jerdee et al., 2023).
Reverse and non-proportionality bias: Correction attempts such as rNMI and cNMI, while addressing baseline elevation, may introduce reverse bias or violate proportionality with fraction of label noise (Liu et al., 2018).

Principal remedies include:

Use of one-sided adjusted mutual information (AMI), subtracting the expected score of a random partition (“no free lunch correction”), and renormalizing so perfect matches always yield 1 (McCarthy et al., 2019).
Asymmetric normalization by ground-truth entropy alone to decouple candidate complexity from score (Jerdee et al., 2023).
Baseline subtraction (rNMI), which yields rNMI $=0$ when no true association exists between candidate and reference.

Table: NMI Bias Correction Strategies and Their Properties

Metric	Bias Correction Approach	Handles Finite-Size Bias?	Enforces Zero Baseline?	Enforces Unit Ceiling?
NMI	Symmetric normalization	No	No	Yes
rNMI	Baseline subtraction	Yes	Yes	No
cNMI	rNMI + scaling	Yes	Yes	Yes, but loses proportionality
AMI (one-sided)	Expectation subtraction + scaling	Yes	Yes	Yes
$\mathrm{NMI}^{(A)}$	Asymmetric normalization	Yes	Yes	Yes

Bootstrapping, permutation testing, and application-specific weighting (e.g., core nodes in networks (Labatut, 2013)) further refine NMI's utility and reliability.

6. Practical Guidelines and Domain-Specific Considerations

Community Detection: Prefer AMI or rNMI over raw NMI for unbiased comparison of algorithm performance, especially with varying partition granularity (McCarthy et al., 2019, Zhang, 2015).
High-Dimensional Data: Use kNN-based estimators with log-domain normalization to avoid computational and numeric issues (Nagel et al., 8 May 2024, Tuononen et al., 10 Oct 2024).
Feature Selection: Integrate NMI with relevance thresholds and Fano-type error probability bounds for robust, redundancy-aware feature reduction (Sarhrouni et al., 2022, Nhaila et al., 2022).
Uncertainty Quantification: Batch-wise NMI enables adaptive loss calibration coherent with cross-modal information integration (Stutts et al., 2023).
Time Series and Financial Regime Detection: Rolling-window NMI estimation identifies temporal dependence and regime shifts beyond autocorrelation or variance-based diagnostics (Alonso, 20 Nov 2025).
Applications with Overlapping or Hierarchical Structures: Employ $\mathrm{NMI}_{\max}$ or structure-weighted extensions to ensure discriminatory power and topological consistency (McDaid et al., 2011, Labatut, 2013).

7. Extensions, Limitations, and Recommendations

While NMI is essential for quantifying dependence free from entropy-scale effects, careful attention to estimator implementation, normalization choice, finite-sample correction, and application context is imperative. Ameliorations such as AMI, rNMI, cNMI, or asymmetric normalizations should be reported alongside classical NMI, especially when comparing across algorithms, datasets, or in settings prone to baseline inflation. For tasks emphasizing topological or functional roles—infrastructure networks, molecular dynamics—complement NMI with structure-sensitive measures or domain-specific weighting as warranted (Labatut, 2013, Nagel et al., 8 May 2024).

In summary, NMI delivers a theoretically rigorous, empirically robust, and algorithmically tractable index of shared information, subject to interpretive discipline and context-specific correction (Nagel et al., 8 May 2024, Alonso, 20 Nov 2025, Jerdee et al., 2023, Zhang, 2015). Its continued evolution—addressing bias, scale, and domain-structure—remains central to quantitative information theory, high-dimensional statistics, and data-driven model selection.