Papers
Topics
Authors
Recent
2000 character limit reached

Matthews Correlation Coefficient (MCC)

Updated 29 December 2025
  • MCC is a scalar metric that computes the Pearson correlation between observed and predicted binary labels, ranging from -1 to 1 with strict symmetry properties.
  • It effectively evaluates classifier performance in imbalanced settings by incorporating all four confusion matrix entries and penalizing both types of errors equally.
  • Recent extensions include multiclass generalizations, weighted variants, and a differentiable loss function for deep learning, enhancing applications in bioinformatics, medical imaging, and cyber-security.

The Matthews Correlation Coefficient (MCC) is a scalar metric of confusion matrix performance, offering a balanced assessment of classification accuracy that remains informative under class imbalance. Originally formulated for binary classification and equivalent to the Pearson correlation coefficient between predicted and true binary labels, MCC is now a standard evaluation measure in machine learning, bioinformatics, medical imaging, and cyber-security research. Its rigorous symmetry properties, penalties for both types of misclassifications, and robust theoretical foundations have led to numerous generalizations and methodological advances, especially for multiclass and imbalanced settings.

1. Formal Definition, Correlation Interpretation, and Range

Given a binary confusion matrix with entries:

  • True Positives (TP)
  • True Negatives (TN)
  • False Positives (FP)
  • False Negatives (FN)

the Matthews Correlation Coefficient is

MCC=TP×TNFP×FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{MCC} = \frac{TP \times TN - FP \times FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}

This metric takes values in [1,1][-1, 1], where %%%%1%%%% indicates perfect correlation (all predictions correct), $0$ indicates performance no better than random, and 1-1 represents total disagreement (systematic reversals between prediction and ground truth) (Abhishek et al., 2020, Stoica et al., 2023, Yao et al., 2020, Ferrer, 2022).

MCC is the sample Pearson correlation coefficient between the indicator vectors for ground truth and predictions, i.e.,

corr(Y,Y^)=Cov(Y,Y^)Var(Y)Var(Y^)\mathrm{corr}(Y, \hat{Y}) = \frac{\operatorname{Cov}(Y, \hat{Y})}{\sqrt{\operatorname{Var}(Y) \operatorname{Var}(\hat{Y})}}

This means MCC captures both the tendency toward correct labeling and the symmetry between positive and negative classes (Ferrer, 2022).

Its symmetry properties include invariance under swapping positive/negative labels and under permuting predicted vs. true labels.

2. Class Imbalance, Decision-Theoretic Properties, and Limitations

MCC remains well-calibrated even as class frequencies become highly skewed, in contrast to accuracy or F1-score, which can be drastically inflated by the majority class (Yao et al., 2020, Thiyagarajan et al., 22 Dec 2025, Ferrer, 2022). Because it incorporates all four confusion-matrix entries, MCC penalizes classifiers that perform well only on the majority class but are ineffective on the minority class.

However, MCC is “Type IV symmetric” in the sense of cost-behavior analysis: it penalizes misclassification of the rare and the prevalent class equally—both misclassification costs scale as 1/[p2(1p2)]1 / [p_2(1 - p_2)], where p2p_2 is the minority-class prevalence (Hu et al., 2014). This means that MCC does not enforce the property that errors on the rare class carry higher cost, which is often desired in imbalanced learning. Alternative metrics, such as the balanced error rate (BER), do so by imposing asymmetric, prevalence-aware penalties.

In extremely negative-skewed regimes (TNTN \to \infty), MCC converges to the Fowlkes–Mallows index (FM):

limTNMCC=precision×recall\lim_{TN \to \infty}\mathrm{MCC} = \sqrt{\mathrm{precision} \times \mathrm{recall}}

Thus, in object detection or open-world tasks with intractable true-negative counts, FM can be reported as a stand-in for MCC (Crall, 2023).

3. Extensions: Multiclass, Weighted, and Robust Variants

Multiclass Generalizations

Several multiclass extensions of MCC exist:

  • Gorodkin’s RK\mathrm{R}_K: Formulated via multi-dimensional Pearson correlation of one-hot label matrices, reducing to binary MCC for K=2K=2, but can be overly optimistic for “hollow” confusion matrices (complete misclassification) (Stoica et al., 2023).
  • Determinant-Based Formulation: For confusion matrix CC, the multi-class MCC is

MCCK=det(C)i=1Ktij=1Kpj\mathrm{MCC}_K = \frac{\det(C)}{\sqrt{\prod_{i=1}^K t_i \prod_{j=1}^K p_j}}

where tit_i, pjp_j are row and column sums, respectively (Itai et al., 2022).

  • Macro and Micro-Averaged MCC: Compute binary MCC for each class vs. rest (macro), or pool confusion-matrix cells and apply binary MCC formula (micro). The “miM^*” approach ensures range [1,1][-1,1] and proper correlation structure (Tamura et al., 9 Mar 2025).
  • Enhanced Metrics: Enhanced versions (ERK_K, EMPC1_1, EMCC) are designed to reach 1-1 under maximally poor (hollow) confusion matrices and to handle class imbalance more sensitively (Stoica et al., 2023).
  • Weighted MCC: Incorporates distinct observation weights for binary and multiclass settings, maintaining the same range and providing stability under small perturbations in observed weights (Cortez et al., 23 Dec 2025).

Robustified MCC for Imbalanced Problems

The standard MCC is not robust against class imbalance in the sense that, as the minority class proportion π0\pi \to 0, the Bayes-optimal classifier can essentially ignore the minority class, resulting in true-positive rates tending to zero (Holzmann et al., 11 Apr 2024). To address this, Holzmann and Klar introduce a robustified MCC:

MCCrb=π11π00π01π10π(1π)(d+γ(1γ))(dπ(1π)+1)1/2\mathrm{MCC}_{\mathrm{rb}} = \frac{\pi_{11}\,\pi_{00} - \pi_{01}\,\pi_{10}} {\sqrt{ \pi(1-\pi)\,( d + \gamma(1 - \gamma) ) }} \left( \frac{d}{\pi(1-\pi)} + 1 \right)^{1/2}

for small d>0d > 0, ensuring TPR is bounded below under any class imbalance (Holzmann et al., 11 Apr 2024).

4. Statistical Inference: Confidence Intervals and Variance Estimation

The sampling variability of MCC is nontrivial, particularly under class-imbalance. The asymptotic distribution can be derived by the delta method, employing the multiclass multinomial model for cell probabilities:

n(MCC^MCC)dN(0,σ2)\sqrt{n} (\hat{\mathrm{MCC}} - \mathrm{MCC}) \xrightarrow{d} N(0, \sigma^2)

where σ2\sigma^2 involves the gradient of MCC with respect to cell probabilities and empirical covariance (Itaya et al., 21 May 2024, Tamura et al., 9 Mar 2025). Explicit expressions for the partial derivatives are available, facilitating computation of standard errors.

Fisher’s z-transformation, z=12ln1+MCC1MCCz = \frac{1}{2} \ln \frac{1 + \mathrm{MCC}}{1 - \mathrm{MCC}}, reduces skewness and improves coverage of confidence intervals, especially for small samples or in the presence of strong imbalance.

For paired study designs—comparing two classifiers on the same subjects—delta-method-based and transformation-based CIs are available for the difference in MCCs (Itaya et al., 21 May 2024, Tamura et al., 9 Mar 2025).

5. Empirical Motivation, Applications, and Best Practices

MCC addresses reproducibility failures and misleading conclusions in classification research, especially where F1 or accuracy can be either arbitrary or misrepresentative. Yao & Shepperd demonstrate that classifier ranking by F1 and MCC disagrees in 23% of defect prediction experiments; the difference is more pronounced for similar F1 values (Yao et al., 2020).

MCC is especially advocated in:

Under severe imbalance, accuracy may be misleadingly high despite total failure to detect the minority class; MCC collapses to zero in such cases, reflecting genuine lack of discrimination (Thiyagarajan et al., 22 Dec 2025).

Best practices:

  • Report the full confusion matrix, permitting recomputation of MCC and related metrics.
  • For binary settings, prefer MCC over F1, accuracy, or AUROC when class sizes are skewed.
  • In multiclass settings, use macro- and micro-averaged or determinant-based MCCs, considering enhanced variants under severe misclassification.
  • Use Fisher’s z-intervals or delta-based CIs for inferential assessments, especially for method comparisons (Tamura et al., 9 Mar 2025, Itaya et al., 21 May 2024).

6. Methodological Advances and Loss Functions in Deep Learning

The differentiable version of MCC enables its direct use as a loss function in deep learning. By expressing confusion-matrix counts as differentiable sums of predicted probabilities and labels, one obtains a continuous “MCC loss,” suitable for gradient-based optimization:

LMCC=1MCC({y^i},{yi})\mathcal{L}_{\mathrm{MCC}} = 1 - \mathrm{MCC}(\{\hat y_i\}, \{y_i\})

where y^i[0,1]\hat y_i \in [0,1] are soft predictions. This loss back-propagates cleanly, preserving the full class-imbalance sensitivity of MCC (Abhishek et al., 2020). Empirical studies in skin lesion segmentation confirm that MCC-loss outperforms Dice-loss and other imbalance-agnostic losses, yielding consistent improvements in Jaccard, Dice, and accuracy metrics (Abhishek et al., 2020).

MCC is contrasted with other commonly-used metrics:

  • F1-score: Ignores true negatives; can result in identical F1 for classifiers with vastly different abilities to reject the majority class (Yao et al., 2020).
  • Precision–Recall, ROC curves: These can be insensitive or misleading under extreme imbalance. The MCC–F1 curve provides a comprehensive visualization by plotting normalized MCC vs. F1 across thresholds and choosing an optimal threshold based on Euclidean distance to the perfect corner; it robustly differentiates classifier performance independent of class skew (Cao et al., 2020).
  • Confusion Entropy (CEN): Strongly (almost monotonically) related to multiclass MCC; for most practical purposes, MCC captures the discriminant power of CEN in a simpler, interpretable form (Jurman et al., 2010).

A limitation of MCC is that it is undefined when any sum in the denominator is zero—that is, if a class is never predicted or never appears (Yao et al., 2020). In multiclass or highly imbalanced regimes, care must be taken with such edge cases.


References:

  • "Matthews Correlation Coefficient Loss for Deep Convolutional Networks: Application to Skin Lesion Segmentation" (Abhishek et al., 2020)
  • "Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters" (Yao et al., 2020)
  • "The MCC approaches the geometric mean of precision and recall as true negatives approach infinity" (Crall, 2023)
  • "Analysis and Comparison of Classification Metrics" (Ferrer, 2022)
  • "The MCC-F1 curve: a performance evaluation technique for binary classification" (Cao et al., 2020)
  • "Evaluating MCC for Low-Frequency Cyberattack Detection in Imbalanced Intrusion Detection Data" (Thiyagarajan et al., 22 Dec 2025)
  • "A study on cost behaviors of binary classification measures in class-imbalanced problems" (Hu et al., 2014)
  • "Pearson-Matthews correlation coefficients for binary and multinary classification and hypothesis testing" (Stoica et al., 2023)
  • "A unifying view for performance measures in multi-class prediction" (Jurman et al., 2010)
  • "Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights" (Cortez et al., 23 Dec 2025)
  • "Asymptotic Properties of Matthews Correlation Coefficient" (Itaya et al., 21 May 2024)
  • "Statistical Inference of the Matthews Correlation Coefficient for Multiclass Classification" (Tamura et al., 9 Mar 2025)
  • "Goodness of Fit Metrics for Multi-class Predictor" (Itai et al., 2022)
  • "Robust performance metrics for imbalanced classification problems" (Holzmann et al., 11 Apr 2024)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Matthews Correlation Coefficient (MCC).