Papers
Topics
Authors
Recent
Search
2000 character limit reached

Weighted MCC Variants

Updated 12 April 2026
  • The article introduces weighted MCC as a generalization of the standard MCC by leveraging instance-specific weights to highlight key sample contributions.
  • Robust MCC incorporates regularization to address class imbalance, ensuring sustained true-positive rates and balanced sensitivity-specificity trade-offs.
  • Efficient implementations for binary and multiclass cases are detailed, with applications extending to physical simulations and weighted model counting.

A weighted variant of the Matthews Correlation Coefficient (MCC) is a generalization of the classical MCC and related correlation-based classification metrics. These variants allow for instance-dependent weighting, robust regularization to address class imbalance, or, in the domain of physical simulation, superparticle weighting in Monte Carlo collision algorithms. Weighted MCC variants appear in multiple methodological contexts including robust evaluation in imbalanced learning, sample-weighted or cost-sensitive performance evaluation, and stochastic inference in probabilistic models. This article covers the principal classes of weighted MCC variants, their mathematical formulation, their algorithmic properties, theoretical guarantees, and empirical significance.

1. Weighted MCC in Classification: Formulation and Motivation

Weighted MCC extends the standard Pearson-Matthews correlation to settings where individual observations carry distinct importances, such as variable instance weights from sampling policies, dataset curation, or problem-specific heterogeneity. For NN samples with true labels tn∈{0,1}t_n \in \{0,1\}, predicted labels cn∈{0,1}c_n \in \{0,1\}, and positive weights Sn>0S_n > 0, the weighted MCC (WMCC) in the binary case is computed as:

WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}

with the weighted sum analogues:

  • TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n
  • TN=∑n=1NSn(1−tn)(1−cn)TN = \sum_{n=1}^N S_n (1-t_n)(1-c_n)
  • FP=∑n=1NSn(1−tn)cnFP = \sum_{n=1}^N S_n (1-t_n) c_n
  • FN=∑n=1NSntn(1−cn)FN = \sum_{n=1}^N S_n t_n (1-c_n)

This construction, formulated rigorously in "Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights" (Cortez et al., 23 Dec 2025), ensures that correct (or incorrect) predictions on highly weighted observations more heavily influence the overall metric value. The extension to multiclass problems leverages weighted covariance structures between one-hot encoded label vectors.

Weighted MCC is critically distinguished from unweighted MCC by its sensitivity to the distribution of sample-specific weights, permitting finer discrimination between classifiers which succeed on important, rare, or high-impact examples (Cortez et al., 23 Dec 2025).

2. Robust MCC for Imbalanced Classification

The standard MCC, like F-score or Jaccard, is sensitive to class imbalance, tending to favor classifiers that ignore the minority class as its prevalence π\pi tends to zero. The robust MCC (denoted tn∈{0,1}t_n \in \{0,1\}0) introduces a regularization parameter tn∈{0,1}t_n \in \{0,1\}1 into the denominator, preventing the metric from collapsing when the classifier's output variance shrinks under extreme imbalance:

tn∈{0,1}t_n \in \{0,1\}2

where tn∈{0,1}t_n \in \{0,1\}3 is the proportion of predicted positives. This form ensures that, even with tn∈{0,1}t_n \in \{0,1\}4, the optimal-threshold Bayes classifier for tn∈{0,1}t_n \in \{0,1\}5 maintains a true-positive rate bounded away from zero. Larger values of tn∈{0,1}t_n \in \{0,1\}6 increase forced detection of minority examples, at the expense of false positives; selecting tn∈{0,1}t_n \in \{0,1\}7 allows explicit control of the TPR/FPR trade-off (Holzmann et al., 2024). This robustification directly addresses the principal limitation of the unweighted MCC in highly imbalanced contexts.

3. Algorithmic Implementation and Computational Properties

Weighted MCC and its multiclass extensions are amenable to efficient, single-pass computation. For the binary case, the metric is computed by accumulating four weighted contingency sums in tn∈{0,1}t_n \in \{0,1\}8 time. For multiclass (tn∈{0,1}t_n \in \{0,1\}9 classes), one constructs centered, weighted covariance matrices for the true and predicted one-hot label vectors, then evaluates the Extended Correlation Coefficient (ECC) or related MPCcn∈{0,1}c_n \in \{0,1\}0, MPCcn∈{0,1}c_n \in \{0,1\}1 variants:

  • ECC: cn∈{0,1}c_n \in \{0,1\}2
  • MPCcn∈{0,1}c_n \in \{0,1\}3: cn∈{0,1}c_n \in \{0,1\}4
  • MPCcn∈{0,1}c_n \in \{0,1\}5: cn∈{0,1}c_n \in \{0,1\}6

Efficient implementation is O(cn∈{0,1}c_n \in \{0,1\}7) for full matrices or O(cn∈{0,1}c_n \in \{0,1\}8) for trace/diagonal-only forms, as detailed in (Cortez et al., 23 Dec 2025). For robust MCC, the formula involving regularization cn∈{0,1}c_n \in \{0,1\}9 merely replaces the denominator and requires no additional complexity over the standard MCC (Holzmann et al., 2024).

4. Theoretical Properties: Robustness, Lipschitz Bounds, and Comparison

Weighted MCC variants possess several notable theoretical guarantees:

  • The binary WMCC is Sn>0S_n > 00-Lipschitz in the perturbation of sample weights: changing any Sn>0S_n > 01 by at most Sn>0S_n > 02 changes WMCC by at most Sn>0S_n > 03 for some constant Sn>0S_n > 04, independently of Sn>0S_n > 05.
  • Multiclass ECC and the MPC variants are Sn>0S_n > 06-Lipschitz under weight perturbations.
  • Robust MCC with denominator regularization Sn>0S_n > 07 provably upper-bounds the optimal classification threshold and establishes a uniform lower bound on the achievable TPR for the minority class, even as its prior probability vanishes. No such guarantees hold for unweighted MCC or F-scores (Holzmann et al., 2024).
  • Weighted MCC reduces exactly to the unweighted score when all Sn>0S_n > 08 are equal, and to class-weighted MCC when Sn>0S_n > 09 are constant per class (Cortez et al., 23 Dec 2025).

A comparative summary:

Variant Robustness to weights Insensitivity to imbalance Achievable TPR bounded
Standard MCC No No No
Weighted MCC WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}0/WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}1 No No
Robust MCC (WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}2) Yes (to imbalance, via WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}3) Yes Yes

5. Empirical Evidence and Illustrative Behavior

Empirical results in (Cortez et al., 23 Dec 2025) demonstrate that, on synthetic datasets with strongly heterogenous sample weights, weighted MCC sharply distinguishes classifiers that succeed on highly weighted samples, whereas unweighted MCC remains invariant. In multiclass scenarios, the weighted ECC or MPC metrics exhibit analogous sensitivity.

For class-imbalance scenarios, simulations and real data analyses in (Holzmann et al., 2024) show that the robust MCC maintains a TPR for the minority class substantially above zero, even when standard MCC (as well as F-score and Jaccard) drop to effectively ignoring the rare class. Robust MCC also provides a mechanism for the practitioner to tune sensitivity-specificity trade-offs by the weighting parameter WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}4.

In all cases, weighted variants maintain the range WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}5 and continuity properties analogous to their unweighted counterparts.

6. Applications in Physical Simulation (PIC/MCC Context)

In particle-based plasma simulations using the Particle-in-Cell with Monte Carlo Collisions (PIC/MCC) framework, the terminology "weighted MCC" can also denote the weighting of superparticles representing many physical particles. In such schemes, each superparticle carries a weight WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}6, and the per-superparticle collision probability is computed identically to the real-particle case, with the caveat that one superparticle collision counts for WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}7 real particle collisions. The interaction of weight WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}8 with numerical stability and accuracy conditions (e.g., time-step WMCC(t,c;S)=TP⋅TN−FP⋅FN(TP+FP)(TP+FN)(TN+FP)(TN+FN)\mathrm{WMCC}(t, c; S) = \frac{TP \cdot TN - FP \cdot FN} {\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}9, grid spacing TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n0) is nontrivial:

  • To maintain single-collision-per-step validity: TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n1.
  • Accuracy for physical observables such as electron density or energy distribution may require superparticle densities (per Debye length) exceeding TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n2.
  • Excessively large TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n3 (few superparticles) degrades accuracy and may yield unphysical observables, especially under low-pressure or high-voltage regimes (Vass et al., 2021).

Although the MCC here refers explicitly to Monte Carlo collisions, the role of particle weight is analogous: higher TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n4 amplifies the stochastic impact of each superparticle's interaction, and tuning TP=∑n=1NSntncnTP = \sum_{n=1}^N S_n t_n c_n5 mediates the numerical fidelity and physical realism.

7. Variants in Weighted Model Counting (WMC)

Weighted model counting (WMC) and related inference tasks in probabilistic AI also feature a "weighted" variant, where the weights assigned to outcomes are random variables reflecting parameter uncertainty. The focus in (Nakamura et al., 7 Jan 2026) is not on an MCC-type correlation, but rather on the computation of variance in the WMC due to stochastic weights. The tractability of variance computation hinges critically on the circuit representation (specifically, structured decomposable deterministic NNFs), and the presence of weighted computations enables parameter-impact analysis for Bayesian network inference. This is a distinct, though methodologically related, use of weighting.

References

  • Cortez, B., & Krishnamoorthy, R. "Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights" (Cortez et al., 23 Dec 2025).
  • Holzmann, H., & Klar, B. "Robust performance metrics for imbalanced classification problems" (Holzmann et al., 2024).
  • Vass, M., Palla, P., & Hartmann, P. "Revisiting the numerical stability/accuracy conditions of explicit PIC/MCC simulations of low-temperature gas discharges" (Vass et al., 2021).
  • Iwama, K., Miura, T., & Sato, T. "Variance Computation for Weighted Model Counting with Knowledge Compilation Approach" (Nakamura et al., 7 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weighted MCC Variants.