Papers
Topics
Authors
Recent
Search
2000 character limit reached

MCC Loss for Imbalanced Segmentation

Updated 28 February 2026
  • MCC Loss is a differentiable, metric-based loss function that leverages the Matthews Correlation Coefficient to optimize segmentation under imbalanced conditions.
  • It computes the Pearson correlation by considering true/false positives and negatives, thereby preventing trivial all-background predictions.
  • Empirical studies demonstrate improved IoU, sensitivity, and specificity in lesion segmentation compared to traditional losses.

The Matthews Correlation Coefficient (MCC) loss is a metric-based loss function designed for deep learning tasks, particularly effective under severe class imbalance. Rooted in the statistical properties of the MCC, it is employed as a differentiable objective in segmentation and classification settings where conventional loss functions are susceptible to domination by the majority class. MCC loss calculates the Pearson correlation between predicted and ground-truth binary labels over all pixels, explicitly considering true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). This comprehensive accounting enables penalization for both types of misclassification, resulting in improved model performance and robustness—particularly in tasks such as lesion segmentation, where the background class disproportionately dominates the pixel distribution (Abhishek et al., 2020).

1. Definition and Motivation

MCC is classically defined for binary classification as

MCC=TPTNFPFN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{MCC} = \frac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}{\sqrt{(\mathrm{TP} + \mathrm{FP})(\mathrm{TP} + \mathrm{FN})(\mathrm{TN} + \mathrm{FP})(\mathrm{TN} + \mathrm{FN})}}

Unlike overlap-based metrics such as Dice or Jaccard, which ignore TN and focus solely on the lesion (foreground), MCC incorporates all four confusion matrix entries. This property ensures statistical significance even in the presence of extreme label imbalance. For pixel-wise segmentation, where non-lesion pixels vastly outnumber lesion pixels, this attribute prevents the network from converging to trivial all-background solutions, a failure mode often observed with cross-entropy and Dice-based losses.

Dice, Jaccard, and extensions (Tversky, focal-Dice, composite losses) seek to balance recall and precision or weigh error types differently but do not include explicit penalization for background misclassifications. As a result, networks optimized under these objectives can achieve low scoring metrics simply by predominantly predicting background, with few incentives to correct missed lesions or spurious background predictions (Abhishek et al., 2020).

2. Mathematical Formulation

Discrete MCC

For discrete label assignments, MCC is computed using the standard confusion matrix counts.

Differentiable (Soft) MCC Loss

To enable direct optimization via backpropagation, a continuous, differentiable analog is constructed by substituting per-pixel soft outputs y^i[0,1]\hat y_i \in [0,1] for predictions and yi{0,1}y_i \in \{0,1\} for ground truth, across all NN pixels: TP=iy^iyi;TN=i(1y^i)(1yi)\mathrm{TP} = \sum_i \hat y_i y_i; \quad \mathrm{TN} = \sum_i (1 - \hat y_i)(1 - y_i)

FP=iy^i(1yi);FN=i(1y^i)yi\mathrm{FP} = \sum_i \hat y_i (1 - y_i); \quad \mathrm{FN} = \sum_i (1 - \hat y_i) y_i

These are substituted into the general MCC expression, producing a differentiable scalar: MCC(y^,y)=TPTNFPFN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{MCC}(\hat y, y) = \frac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}{\sqrt{(\mathrm{TP}+\mathrm{FP})(\mathrm{TP}+\mathrm{FN})(\mathrm{TN}+\mathrm{FP})(\mathrm{TN}+\mathrm{FN})}} The associated loss is LMCC=1MCC(y^,y)L_{\text{MCC}} = 1 - \mathrm{MCC}(\hat y, y). For numerical stability, a small ϵ\epsilon (e.g., 10610^{-6}) is added to each term in the denominator, avoiding indeterminate forms in highly skewed batches.

Algebraic transformations provide computationally efficient variants. The numerator can be written as: y^i[0,1]\hat y_i \in [0,1]0 and the denominator rewritten as a function of sums over y^i[0,1]\hat y_i \in [0,1]1 and y^i[0,1]\hat y_i \in [0,1]2; explicit forms are detailed in Equations (4–5) of the reference.

Modern frameworks (PyTorch, TensorFlow) enable end-to-end differentiability; explicit gradients with respect to y^i[0,1]\hat y_i \in [0,1]3 (provided in Eqn. 6) are computed by automatic differentiation (Abhishek et al., 2020).

3. Integration with Deep Convolutional Architectures

MCC loss is compatible with standard encoder-decoder segmentation networks. Implementation steps include:

  • Utilizing a vanilla U-Net architecture with skip connections.
  • The output layer computes a single-channel mask: y^i[0,1]\hat y_i \in [0,1]4, where y^i[0,1]\hat y_i \in [0,1]5 denotes elementwise sigmoid activation.
  • y^i[0,1]\hat y_i \in [0,1]6 is evaluated per image, loss gradients are propagated via y^i[0,1]\hat y_i \in [0,1]7 through y^i[0,1]\hat y_i \in [0,1]8, and parameters y^i[0,1]\hat y_i \in [0,1]9 are updated to minimize the batch-aggregated MCC loss.

To ensure computational safety, yi{0,1}y_i \in \{0,1\}0 regularization is included in denominator computations. Well-behaved gradients are preserved even under vanishing TP/TN scenarios resulting from rare-class or degenerate batch compositions (Abhishek et al., 2020).

4. Experimental Protocol and Comparative Results

Three benchmark datasets are utilized for empirical validation:

  • ISIC 2017: 2,000 train, 150 validation, 600 test dermoscopic images (benign nevi, melanoma, seborrheic keratosis).
  • DermoFit: 1,300 clinical images split as 780/130/390 (train/val/test).
  • PH2: 200 dermoscopic images split as 120/20/60.

All images are resized to 128×128 pixels, with training performed on U-Net architectures using batch size 40, learning rate 1e-3, SGD optimizer, and on-the-fly augmentations (flips, ±45° rotations).

MCC-trained models are compared with identical networks trained using Dice loss. The mean Jaccard index (IoU) demonstrates statistically significant gains for MCC loss, as detailed below:

Dataset Dice Loss MCC Loss Gain Significance
ISIC 2017 0.6758 0.7518 +11.25% p < 0.001
DermoFit 0.7418 0.7779 +4.87% p < 0.001
PH2 0.8051 0.8112 +0.76% p < 0.05

Improvements in pixel-accuracy, Dice, sensitivity, and specificity metrics are also observed. Qualitative inspections show that MCC loss yields sharper boundaries and reduces both false positives and false negatives relative to Dice-optimized networks (Abhishek et al., 2020).

5. Analysis under Class Imbalance and Broader Applicability

The MCC loss’s inclusion of TN in its optimization objective explicitly penalizes improper background predictions and ameliorates the tendency toward degenerate all-background models, which afflict overlap-centric loss functions in the class-imbalanced regime. This facilitates balanced improvements in both sensitivity (lesion recall) and specificity (background discrimination).

Extensions of MCC loss are feasible for multi-class segmentation by generalizing the confusion matrix to yi{0,1}y_i \in \{0,1\}1 and constructing multi-class MCC terms. Furthermore, for rare-event and highly imbalanced classification (binary or multi-class), yi{0,1}y_i \in \{0,1\}2 can replace cross-entropy to yield more equitable optimization. Joint loss formulations, combining yi{0,1}y_i \in \{0,1\}3 with pixel-wise cross-entropy targets, are suggested as a means to stabilize early-stage learning (Abhishek et al., 2020).

6. Impact and Generalization Potential

Empirical results across three distinct lesion segmentation benchmarks confirm the efficacy of MCC loss in improving mean Jaccard index and secondary segmentation metrics. The approach directly translates the desirable properties of the Matthews correlation coefficient—principally its invariance to class imbalance and exhaustive utilization of the confusion matrix—into an end-to-end learnable setting for deep neural networks.

MCC loss constitutes a robust alternative to overlap-based and cross-entropy loss functions especially in scenarios where the prevalence of the target class is substantially lower than the background. A plausible implication is that similar performance gains could be realized in other domains suffering imbalance, provided the loss is properly extended or adapted to task-specific structures (Abhishek et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Matthews Correlation Coefficient (MCC) Loss.