Weighted MCC Variants

Updated 12 April 2026

The article introduces weighted MCC as a generalization of the standard MCC by leveraging instance-specific weights to highlight key sample contributions.
Robust MCC incorporates regularization to address class imbalance, ensuring sustained true-positive rates and balanced sensitivity-specificity trade-offs.
Efficient implementations for binary and multiclass cases are detailed, with applications extending to physical simulations and weighted model counting.

A weighted variant of the Matthews Correlation Coefficient (MCC) is a generalization of the classical MCC and related correlation-based classification metrics. These variants allow for instance-dependent weighting, robust regularization to address class imbalance, or, in the domain of physical simulation, superparticle weighting in Monte Carlo collision algorithms. Weighted MCC variants appear in multiple methodological contexts including robust evaluation in imbalanced learning, sample-weighted or cost-sensitive performance evaluation, and stochastic inference in probabilistic models. This article covers the principal classes of weighted MCC variants, their mathematical formulation, their algorithmic properties, theoretical guarantees, and empirical significance.

1. Weighted MCC in Classification: Formulation and Motivation

Weighted MCC extends the standard Pearson-Matthews correlation to settings where individual observations carry distinct importances, such as variable instance weights from sampling policies, dataset curation, or problem-specific heterogeneity. For $N$ samples with true labels $t_n \in \{0,1\}$ , predicted labels $c_n \in \{0,1\}$ , and positive weights $S_n > 0$ , the weighted MCC (WMCC) in the binary case is computed as:

$\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$

with the weighted sum analogues:

$TP = \sum_{n=1}^N S_n t_n c_n$
$TN = \sum_{n=1}^N S_n (1-t_n)(1-c_n)$
$FP = \sum_{n=1}^N S_n (1-t_n) c_n$
$FN = \sum_{n=1}^N S_n t_n (1-c_n)$

This construction, formulated rigorously in "Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights" (Cortez et al., 23 Dec 2025), ensures that correct (or incorrect) predictions on highly weighted observations more heavily influence the overall metric value. The extension to multiclass problems leverages weighted covariance structures between one-hot encoded label vectors.

Weighted MCC is critically distinguished from unweighted MCC by its sensitivity to the distribution of sample-specific weights, permitting finer discrimination between classifiers which succeed on important, rare, or high-impact examples (Cortez et al., 23 Dec 2025).

2. Robust MCC for Imbalanced Classification

The standard MCC, like F-score or Jaccard, is sensitive to class imbalance, tending to favor classifiers that ignore the minority class as its prevalence $\pi$ tends to zero. The robust MCC (denoted $t_n \in \{0,1\}$ 0) introduces a regularization parameter $t_n \in \{0,1\}$ 1 into the denominator, preventing the metric from collapsing when the classifier's output variance shrinks under extreme imbalance:

$t_n \in \{0,1\}$ 2

where $t_n \in \{0,1\}$ 3 is the proportion of predicted positives. This form ensures that, even with $t_n \in \{0,1\}$ 4, the optimal-threshold Bayes classifier for $t_n \in \{0,1\}$ 5 maintains a true-positive rate bounded away from zero. Larger values of $t_n \in \{0,1\}$ 6 increase forced detection of minority examples, at the expense of false positives; selecting $t_n \in \{0,1\}$ 7 allows explicit control of the TPR/FPR trade-off (Holzmann et al., 2024). This robustification directly addresses the principal limitation of the unweighted MCC in highly imbalanced contexts.

3. Algorithmic Implementation and Computational Properties

Weighted MCC and its multiclass extensions are amenable to efficient, single-pass computation. For the binary case, the metric is computed by accumulating four weighted contingency sums in $t_n \in \{0,1\}$ 8 time. For multiclass ( $t_n \in \{0,1\}$ 9 classes), one constructs centered, weighted covariance matrices for the true and predicted one-hot label vectors, then evaluates the Extended Correlation Coefficient (ECC) or related MPC $c_n \in \{0,1\}$ 0, MPC $c_n \in \{0,1\}$ 1 variants:

ECC: $c_n \in \{0,1\}$ 2
MPC $c_n \in \{0,1\}$ 3: $c_n \in \{0,1\}$ 4
MPC $c_n \in \{0,1\}$ 5: $c_n \in \{0,1\}$ 6

Efficient implementation is O( $c_n \in \{0,1\}$ 7) for full matrices or O( $c_n \in \{0,1\}$ 8) for trace/diagonal-only forms, as detailed in (Cortez et al., 23 Dec 2025). For robust MCC, the formula involving regularization $c_n \in \{0,1\}$ 9 merely replaces the denominator and requires no additional complexity over the standard MCC (Holzmann et al., 2024).

4. Theoretical Properties: Robustness, Lipschitz Bounds, and Comparison

Weighted MCC variants possess several notable theoretical guarantees:

The binary WMCC is $S_n > 0$ 0-Lipschitz in the perturbation of sample weights: changing any $S_n > 0$ 1 by at most $S_n > 0$ 2 changes WMCC by at most $S_n > 0$ 3 for some constant $S_n > 0$ 4, independently of $S_n > 0$ 5.
Multiclass ECC and the MPC variants are $S_n > 0$ 6-Lipschitz under weight perturbations.
Robust MCC with denominator regularization $S_n > 0$ 7 provably upper-bounds the optimal classification threshold and establishes a uniform lower bound on the achievable TPR for the minority class, even as its prior probability vanishes. No such guarantees hold for unweighted MCC or F-scores (Holzmann et al., 2024).
Weighted MCC reduces exactly to the unweighted score when all $S_n > 0$ 8 are equal, and to class-weighted MCC when $S_n > 0$ 9 are constant per class (Cortez et al., 23 Dec 2025).

A comparative summary:

Variant	Robustness to weights	Insensitivity to imbalance	Achievable TPR bounded
Standard MCC	No	No	No
Weighted MCC	$\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 0/ $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 1	No	No
Robust MCC ( $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 2)	Yes (to imbalance, via $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 3)	Yes	Yes

5. Empirical Evidence and Illustrative Behavior

Empirical results in (Cortez et al., 23 Dec 2025) demonstrate that, on synthetic datasets with strongly heterogenous sample weights, weighted MCC sharply distinguishes classifiers that succeed on highly weighted samples, whereas unweighted MCC remains invariant. In multiclass scenarios, the weighted ECC or MPC metrics exhibit analogous sensitivity.

For class-imbalance scenarios, simulations and real data analyses in (Holzmann et al., 2024) show that the robust MCC maintains a TPR for the minority class substantially above zero, even when standard MCC (as well as F-score and Jaccard) drop to effectively ignoring the rare class. Robust MCC also provides a mechanism for the practitioner to tune sensitivity-specificity trade-offs by the weighting parameter $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 4.

In all cases, weighted variants maintain the range $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 5 and continuity properties analogous to their unweighted counterparts.

6. Applications in Physical Simulation (PIC/MCC Context)

In particle-based plasma simulations using the Particle-in-Cell with Monte Carlo Collisions (PIC/MCC) framework, the terminology "weighted MCC" can also denote the weighting of superparticles representing many physical particles. In such schemes, each superparticle carries a weight $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 6, and the per-superparticle collision probability is computed identically to the real-particle case, with the caveat that one superparticle collision counts for $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 7 real particle collisions. The interaction of weight $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 8 with numerical stability and accuracy conditions (e.g., time-step $\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ 9, grid spacing $TP = \sum_{n=1}^N S_n t_n c_n$ 0) is nontrivial:

To maintain single-collision-per-step validity: $TP = \sum_{n=1}^N S_n t_n c_n$ 1.
Accuracy for physical observables such as electron density or energy distribution may require superparticle densities (per Debye length) exceeding $TP = \sum_{n=1}^N S_n t_n c_n$ 2.
Excessively large $TP = \sum_{n=1}^N S_n t_n c_n$ 3 (few superparticles) degrades accuracy and may yield unphysical observables, especially under low-pressure or high-voltage regimes (Vass et al., 2021).

Although the MCC here refers explicitly to Monte Carlo collisions, the role of particle weight is analogous: higher $TP = \sum_{n=1}^N S_n t_n c_n$ 4 amplifies the stochastic impact of each superparticle's interaction, and tuning $TP = \sum_{n=1}^N S_n t_n c_n$ 5 mediates the numerical fidelity and physical realism.

7. Variants in Weighted Model Counting (WMC)

Weighted model counting (WMC) and related inference tasks in probabilistic AI also feature a "weighted" variant, where the weights assigned to outcomes are random variables reflecting parameter uncertainty. The focus in (Nakamura et al., 7 Jan 2026) is not on an MCC-type correlation, but rather on the computation of variance in the WMC due to stochastic weights. The tractability of variance computation hinges critically on the circuit representation (specifically, structured decomposable deterministic NNFs), and the presence of weighted computations enables parameter-impact analysis for Bayesian network inference. This is a distinct, though methodologically related, use of weighting.

References

Cortez, B., & Krishnamoorthy, R. "Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights" (Cortez et al., 23 Dec 2025).
Holzmann, H., & Klar, B. "Robust performance metrics for imbalanced classification problems" (Holzmann et al., 2024).
Vass, M., Palla, P., & Hartmann, P. "Revisiting the numerical stability/accuracy conditions of explicit PIC/MCC simulations of low-temperature gas discharges" (Vass et al., 2021).
Iwama, K., Miura, T., & Sato, T. "Variance Computation for Weighted Model Counting with Knowledge Compilation Approach" (Nakamura et al., 7 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (4)

Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights (2025)

Robust performance metrics for imbalanced classification problems (2024)

Revisiting the numerical stability/accuracy conditions of explicit PIC/MCC simulations of low-temperature gas discharges (2021)

Variance Computation for Weighted Model Counting with Knowledge Compilation Approach (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weighted MCC Variants.

Weighted MCC Variants

1. Weighted MCC in Classification: Formulation and Motivation

2. Robust MCC for Imbalanced Classification

3. Algorithmic Implementation and Computational Properties

4. Theoretical Properties: Robustness, Lipschitz Bounds, and Comparison

5. Empirical Evidence and Illustrative Behavior

6. Applications in Physical Simulation (PIC/MCC Context)

7. Variants in Weighted Model Counting (WMC)

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Weighted MCC Variants

1. Weighted MCC in Classification: Formulation and Motivation

2. Robust MCC for Imbalanced Classification

3. Algorithmic Implementation and Computational Properties

4. Theoretical Properties: Robustness, Lipschitz Bounds, and Comparison

5. Empirical Evidence and Illustrative Behavior

6. Applications in Physical Simulation (PIC/MCC Context)

7. Variants in Weighted Model Counting (WMC)

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research