Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 85 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 10 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Weighted BCE + MCC Loss for Imbalanced Segmentation

Updated 21 September 2025
  • The composite loss integrates weighted BCE and differentiable MCC to tackle class imbalance by combining local error control with global confusion matrix insights.
  • It leverages pixel-level weighting and robust gradient formulations to optimize both foreground and background classifications in dense segmentation tasks.
  • Empirical results in skin lesion and retinal vessel segmentation show significant performance gains, with improvements up to 11.25% in the mean Jaccard index.

Weighted Binary Cross-Entropy (BCE) plus Matthews Correlation Coefficient (MCC) Loss is a composite objective function constructed to address class imbalance and provide balanced, rigorous optimization in binary and semantic segmentation tasks. This approach unites the localized, sample-wise error penalization of weighted BCE with the global, confusion-matrix-aware robustness of MCC. Its adoption is motivated by persistent challenges in domains such as medical image segmentation and imbalanced classification, where traditional loss functions often fail to reward balanced model behavior.

1. Mathematical Formulation and Motivation

Let yi{0,1}y_i \in \{0,1\} denote the ground truth and %%%%1%%%% the predicted probability for the iith sample (or pixel, for dense tasks). Weighted binary cross-entropy (BCE) loss is given by

LBCE=1Ni=1N[w1yilog(pi)+w0(1yi)log(1pi)]\mathcal{L}_{\text{BCE}} = -\frac{1}{N} \sum_{i=1}^N \left[ w_1 y_i \log(p_i) + w_0 (1 - y_i)\log(1-p_i) \right]

where w1,w0w_1, w_0 are class weights (usually w1>w0w_1 > w_0 in minority-positive domains).

The differentiable Matthews Correlation Coefficient (MCC) loss is constructed from soft versions of confusion matrix entries:

TP=ipiyi TN=i(1pi)(1yi) FP=ipi(1yi) FN=i(1pi)yi\begin{align*} \text{TP} & = \sum_i p_i y_i \ \text{TN} & = \sum_i (1-p_i)(1-y_i) \ \text{FP} & = \sum_i p_i (1-y_i) \ \text{FN} & = \sum_i (1-p_i)y_i \end{align*}

The MCC is then

MCC=TPTNFPFN(TP+FP)(TP+FN)(TN+FP)(TN+FN)+ε\text{MCC} = \frac{ \text{TP}\cdot\text{TN} - \text{FP}\cdot\text{FN} } { \sqrt{ (\text{TP}+\text{FP}) (\text{TP}+\text{FN}) (\text{TN}+\text{FP}) (\text{TN}+\text{FN}) } + \varepsilon }

where ε\varepsilon is a small constant for numerical stability. The corresponding loss is

LMCC=1MCC\mathcal{L}_{\text{MCC}} = 1 - \text{MCC}

The composite loss takes the form

Ltotal=λ1LBCE+λ2LMCC\mathcal{L}_\text{total} = \lambda_1 \mathcal{L}_{\text{BCE}} + \lambda_2\mathcal{L}_{\text{MCC}}

with λ1,λ20\lambda_1, \lambda_2 \geq 0 controlling the trade-off. This construction leverages the pixel-level, class-sensitive adjustments of weighted BCE and the confusion-matrix-wide balancing of MCC.

2. Addressing Class Imbalance and Limitations of Overlap-Based Losses

Standard BCE and overlap-based losses such as Dice loss or Jaccard index are known to be susceptible to class imbalance, favoring the majority class and, in the case of Dice, not penalizing true negative (background) misclassifications. MCC, by construction, incorporates all four entries of the confusion matrix (TP, TN, FP, FN) and is thus sensitive to both foreground and background misclassifications, providing a more nuanced penalization in highly imbalanced scenarios (Abhishek et al., 2020).

Empirical results in tasks such as skin lesion segmentation demonstrate that models trained with MCC-based losses outperform those trained with Dice loss, yielding improvements of up to 11.25% in mean Jaccard index on highly imbalanced datasets (Abhishek et al., 2020). The use of a weighted BCE component further alleviates imbalance by explicitly increasing the loss contribution from the minority class.

3. Composite Loss Construction and Implementation Considerations

Constructing a weighted BCE plus MCC loss involves several key considerations:

  • Trade-off Parameterization: The λ1\lambda_1, λ2\lambda_2 hyperparameters are tuned to balance pixel-wise accuracy and global balance. For example, in retinal vessel segmentation, equal weighting showed empirical effectiveness (λ1=0.5\lambda_1 = 0.5, λ2=0.5\lambda_2 = 0.5), but domain-specific tuning is recommended (Guo et al., 15 Sep 2025).
  • Numerical Stability: The denominator of the MCC can approach zero if any confusion matrix term vanishes. It is standard to include an ε\varepsilon additive constant (e.g., 10710^{-7}).
  • Gradient Calculation: Soft confusion matrix entries ensure that LMCC\mathcal{L}_{\text{MCC}} is differentiable with respect to pip_i, enabling backpropagation in modern deep learning frameworks.
  • Weighted BCE: Careful selection of w1,w0w_1, w_0 or more sophisticated dynamic or sample-dependent weighting schemes (e.g., distance maps incorporating spatial context (Davari et al., 2021)) can further tailor the composite loss to the domain.

4. Empirical Outcomes and Use Cases

The composite loss has been evaluated in domains with pronounced imbalance:

  • Skin Lesion Segmentation: Training U-Nets with LMCC\mathcal{L}_{\text{MCC}} or its combination with weighted BCE produced superior Jaccard index and sensitivity/specificity trade-offs relative to Dice loss training. On ISIC 2017, the MCC loss achieved mean Jaccard index 0.7518±0.00840.7518 \pm 0.0084 vs. 0.6758±0.00950.6758 \pm 0.0095 for Dice loss (Abhishek et al., 2020).
  • Retinal Vessel Segmentation (SA-UNetv2): Employing the weighted BCE plus MCC loss with a cross-scale spatial attention network yields MCC up to $81.27$ and state-of-the-art F1/Jaccard scores on DRIVE/STARE datasets, all in a model occupying $1.2$ MB and sub-second CPU inference (Guo et al., 15 Sep 2025). This indicates practical compatibility even in constrained deployable systems.
  • Glacier Calving Front Segmentation: Using MCC as the stopping criterion (and improved distance-weighted BCE for the boundary) offers 15%15\% improvement in Dice coefficient compared to BCE-based early stopping, underscoring the generalizability of MCC-based criteria for rare-structure segmentation (Davari et al., 2021).

5. Extensions, Statistical Properties, and Stability

Rigorous studies have established the statistical properties of MCC, including asymptotic normality under standard conditions and the provision of asymptotic confidence intervals using the delta method and Fisher's zz transformation (Itaya et al., 21 May 2024, Tamura et al., 9 Mar 2025). This statistical regularity supports its use as a loss term and as an evaluation metric.

Several insights relevant for loss design arise:

  • Composite Loss Stability: While integrating LMCC\mathcal{L}_{\text{MCC}} with BCE, careful tuning is required to prevent instability, particularly under skewed distributions where the variance of MCC increases.
  • Connection to Multiclass and Multilabel Scenarios: Macro- and micro-averaged MCC extensions are available; these can in principle be used alongside weighted BCE variants or other polynomial expansions (such as Asymmetric Polynomial Loss (Huang et al., 2023)) to treat non-binary settings.
  • Loss Function Generalization: The theoretical framework in (Marchetti et al., 2023) formalizes the link between expected weighted confusion matrix entries and custom score-oriented losses, providing a rigorous basis for generalizing the composite loss structure beyond standard BCE and MCC.

6. Comparative Analysis and Deployment Scenarios

The strengths and potential limitations of the BCE plus MCC approach are summarized as follows:

Aspect Weighted BCE MCC Loss BCE + MCC Composite
Imbalance Handling Direct via class weights Inherent via confusion matrix Strong (complementary)
Penalizes TN errors No (unless weighted) Yes Yes
Granularity Local (sample-wise) Global (statistics of whole batch) Both
Optimization Stability High Moderate (careful gradient needed) Requires tuning

A plausible implication is that the composite loss is especially effective in scenarios where both local accuracy and global class-balance must be maintained (e.g., dense segmentation of rare-foreground biomedical images, highly-imbalanced event detection, or resource-constrained environments where predictable performance is crucial).

7. Prospective Directions and Theoretical Foundations

Recent theoretical work (Marchetti et al., 2023, Tamura et al., 9 Mar 2025) situates the weighted BCE and MCC combination within a broader class of score-oriented loss functions. This perspective enables extensions such as:

  • Custom weighting schemes informed by spatiotemporal features (e.g., onset/offset or distance maps (Song, 20 Mar 2024, Davari et al., 2021)).
  • Dynamic adaptation of λ1,λ2\lambda_1, \lambda_2 during training driven by metric feedback or uncertainty quantification.
  • Surrogate or smoothed differentiable approximations of MCC for enhanced gradient stability.

The composite loss framework thus accommodates both rigorous statistical properties (asymptotic inference for MCC, robust handling of pointwise and aggregate metrics) and pragmatic needs (deployability in constrained settings, empirical enhancement on critical datasets).


Weighted BCE plus MCC loss represents a principled synthesis of local error penalization and global balance, validated both in theory and application across segmentation, detection, and classification settings where class imbalance is acute and full confusion-matrix-aware optimization is required.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Weighted Binary Cross-Entropy (BCE) plus Matthews Correlation Coefficient (MCC) Loss.