MCC Loss for Imbalanced Segmentation
- MCC Loss is a differentiable, metric-based loss function that leverages the Matthews Correlation Coefficient to optimize segmentation under imbalanced conditions.
- It computes the Pearson correlation by considering true/false positives and negatives, thereby preventing trivial all-background predictions.
- Empirical studies demonstrate improved IoU, sensitivity, and specificity in lesion segmentation compared to traditional losses.
The Matthews Correlation Coefficient (MCC) loss is a metric-based loss function designed for deep learning tasks, particularly effective under severe class imbalance. Rooted in the statistical properties of the MCC, it is employed as a differentiable objective in segmentation and classification settings where conventional loss functions are susceptible to domination by the majority class. MCC loss calculates the Pearson correlation between predicted and ground-truth binary labels over all pixels, explicitly considering true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). This comprehensive accounting enables penalization for both types of misclassification, resulting in improved model performance and robustness—particularly in tasks such as lesion segmentation, where the background class disproportionately dominates the pixel distribution (Abhishek et al., 2020).
1. Definition and Motivation
MCC is classically defined for binary classification as
Unlike overlap-based metrics such as Dice or Jaccard, which ignore TN and focus solely on the lesion (foreground), MCC incorporates all four confusion matrix entries. This property ensures statistical significance even in the presence of extreme label imbalance. For pixel-wise segmentation, where non-lesion pixels vastly outnumber lesion pixels, this attribute prevents the network from converging to trivial all-background solutions, a failure mode often observed with cross-entropy and Dice-based losses.
Dice, Jaccard, and extensions (Tversky, focal-Dice, composite losses) seek to balance recall and precision or weigh error types differently but do not include explicit penalization for background misclassifications. As a result, networks optimized under these objectives can achieve low scoring metrics simply by predominantly predicting background, with few incentives to correct missed lesions or spurious background predictions (Abhishek et al., 2020).
2. Mathematical Formulation
Discrete MCC
For discrete label assignments, MCC is computed using the standard confusion matrix counts.
Differentiable (Soft) MCC Loss
To enable direct optimization via backpropagation, a continuous, differentiable analog is constructed by substituting per-pixel soft outputs for predictions and for ground truth, across all pixels:
These are substituted into the general MCC expression, producing a differentiable scalar: The associated loss is . For numerical stability, a small (e.g., ) is added to each term in the denominator, avoiding indeterminate forms in highly skewed batches.
Algebraic transformations provide computationally efficient variants. The numerator can be written as: 0 and the denominator rewritten as a function of sums over 1 and 2; explicit forms are detailed in Equations (4–5) of the reference.
Modern frameworks (PyTorch, TensorFlow) enable end-to-end differentiability; explicit gradients with respect to 3 (provided in Eqn. 6) are computed by automatic differentiation (Abhishek et al., 2020).
3. Integration with Deep Convolutional Architectures
MCC loss is compatible with standard encoder-decoder segmentation networks. Implementation steps include:
- Utilizing a vanilla U-Net architecture with skip connections.
- The output layer computes a single-channel mask: 4, where 5 denotes elementwise sigmoid activation.
- 6 is evaluated per image, loss gradients are propagated via 7 through 8, and parameters 9 are updated to minimize the batch-aggregated MCC loss.
To ensure computational safety, 0 regularization is included in denominator computations. Well-behaved gradients are preserved even under vanishing TP/TN scenarios resulting from rare-class or degenerate batch compositions (Abhishek et al., 2020).
4. Experimental Protocol and Comparative Results
Three benchmark datasets are utilized for empirical validation:
- ISIC 2017: 2,000 train, 150 validation, 600 test dermoscopic images (benign nevi, melanoma, seborrheic keratosis).
- DermoFit: 1,300 clinical images split as 780/130/390 (train/val/test).
- PH2: 200 dermoscopic images split as 120/20/60.
All images are resized to 128×128 pixels, with training performed on U-Net architectures using batch size 40, learning rate 1e-3, SGD optimizer, and on-the-fly augmentations (flips, ±45° rotations).
MCC-trained models are compared with identical networks trained using Dice loss. The mean Jaccard index (IoU) demonstrates statistically significant gains for MCC loss, as detailed below:
| Dataset | Dice Loss | MCC Loss | Gain | Significance |
|---|---|---|---|---|
| ISIC 2017 | 0.6758 | 0.7518 | +11.25% | p < 0.001 |
| DermoFit | 0.7418 | 0.7779 | +4.87% | p < 0.001 |
| PH2 | 0.8051 | 0.8112 | +0.76% | p < 0.05 |
Improvements in pixel-accuracy, Dice, sensitivity, and specificity metrics are also observed. Qualitative inspections show that MCC loss yields sharper boundaries and reduces both false positives and false negatives relative to Dice-optimized networks (Abhishek et al., 2020).
5. Analysis under Class Imbalance and Broader Applicability
The MCC loss’s inclusion of TN in its optimization objective explicitly penalizes improper background predictions and ameliorates the tendency toward degenerate all-background models, which afflict overlap-centric loss functions in the class-imbalanced regime. This facilitates balanced improvements in both sensitivity (lesion recall) and specificity (background discrimination).
Extensions of MCC loss are feasible for multi-class segmentation by generalizing the confusion matrix to 1 and constructing multi-class MCC terms. Furthermore, for rare-event and highly imbalanced classification (binary or multi-class), 2 can replace cross-entropy to yield more equitable optimization. Joint loss formulations, combining 3 with pixel-wise cross-entropy targets, are suggested as a means to stabilize early-stage learning (Abhishek et al., 2020).
6. Impact and Generalization Potential
Empirical results across three distinct lesion segmentation benchmarks confirm the efficacy of MCC loss in improving mean Jaccard index and secondary segmentation metrics. The approach directly translates the desirable properties of the Matthews correlation coefficient—principally its invariance to class imbalance and exhaustive utilization of the confusion matrix—into an end-to-end learnable setting for deep neural networks.
MCC loss constitutes a robust alternative to overlap-based and cross-entropy loss functions especially in scenarios where the prevalence of the target class is substantially lower than the background. A plausible implication is that similar performance gains could be realized in other domains suffering imbalance, provided the loss is properly extended or adapted to task-specific structures (Abhishek et al., 2020).