Mean Corruption Error (mCE) Overview

Updated 23 December 2025

Mean Corruption Error (mCE) is a normalized metric that computes the average increase in prediction error under various synthetic corruptions compared to a baseline model.
It aggregates errors across different corruption types and severity levels, as seen in benchmarks like MangoLeafDB-C and MNIST-C, enabling cross-architecture comparisons.
A lower mCE signifies stronger robustness, guiding practitioners in selecting and developing models for real-world environments with uncertain input quality.

Mean Corruption Error (mCE) is a quantitative metric used to assess and compare the robustness of machine learning classifiers—typically deep neural networks—against a standardized set of input corruptions. mCE provides a normalized measure of how much a model’s predictive error increases under various synthetic corruptions, relative to a reference baseline model, facilitating cross-architecture and cross-method evaluation of real-world robustness. Its adoption has become standard in empirical robustness benchmarks, particularly in computer vision, where models are regularly evaluated on datasets with controlled, repeatable perturbations such as noise, blur, and affine distortions (Andrade et al., 15 Dec 2025, Sargolzaei et al., 21 Aug 2024).

1. Formal Definition of Mean Corruption Error (mCE)

Given a set of $|C|$ corruption types (e.g., noise, blur), the Corruption Error ( $CE^f_c$ ) for a model $f$ on corruption $c$ is defined as the ratio of the model’s error on that corruption to the corresponding error of a fixed reference classifier (commonly a ResNet-101 or vanilla CNN). For datasets with multiple severities per corruption, the error is aggregated over all severity levels. mCE is then the arithmetic mean of $CE^f_c$ across all corruptions:

$CE^f_c = \frac{\sum_{s=1}^{S} E^f_{s,c}}{\sum_{s=1}^{S} E^{\mathrm{ref}}_{s,c}}$

$\mathrm{mCE}^f = \frac{1}{|C|} \sum_{c=1}^{|C|} CE^f_c$

Where:

$E^f_{s,c}$ : error rate of model $f$ on corruption $c$ at severity $s$
$E^{\mathrm{ref}}_{s,c}$ : corresponding error rate of the reference model
$S$ : number of severity levels per corruption (e.g., $S=5$ for MangoLeafDB-C)

By convention, mCE is rescaled so that the baseline model’s mCE is 100, and other values are interpreted as percentages relative to this baseline. Lower mCE values indicate greater robustness: an mCE of 50 signifies that the model averages only half the error of the reference model when confronted with corruption (Andrade et al., 15 Dec 2025, Sargolzaei et al., 21 Aug 2024).

2. Calculation Workflow and Dataset Instantiations

The calculation of mCE is dataset and protocol specific, governed by the properties of the corruption suite:

MangoLeafDB-C: Contains 19 corruption types applied at five severities. For each model, errors ( $E^f_{s,c}$ ) are recorded on 4,000 test images per corruption and severity. $CE^f_c$ is computed for each corruption, and mCE is taken as the mean over all 19 types. ResNet-101 is used as the reference, with its mCE defined as 100 (Andrade et al., 15 Dec 2025).
MNIST-C: Consists of 15 corruption types, each applied to the entire test set. For each corruption, one calculates the baseline error ( $E_c^f$ ) and the new model error ( $E_c^g$ ), forms $CE_c^g = E_c^g / E_c^f$ , and averages over all corruptions for mCE. The baseline’s own mCE is 100 by construction (Sargolzaei et al., 21 Aug 2024).

A table of the mCE computation process for a dataset with $K$ corruptions and $S$ severity levels:

Metric	Formula	Notes
Corruption Error	$CE^f_c = \dfrac{\sum_{s=1}^{S} E^f_{s,c}}{\sum_{s=1}^{S} E^{\mathrm{ref}}_{s,c}}$	Error per corruption $c$ , normalized to baseline
Mean Corruption Error	$\mathrm{mCE}^f = \dfrac{1}{K} \sum_{c=1}^{K} CE^f_c$	Averaged over all $K$ corruptions

3. Interpretation and Role in Robustness Evaluation

mCE quantifies relative model robustness:

mCE < 100: Model performs better (lower error increase under corruption) than the reference.
mCE = 100: Equivalent to the reference baseline’s sensitivity to corruption.
mCE > 100: Less robust than the reference; higher error increase.

Practically, mCE enables meaningful comparison across architectures and defense strategies under systematic corruptions. For example, in the MangoLeafDB-C benchmark, an LCNN exhibits mCE = 48.9, substantially outperforming larger networks such as ResNet-101 (100) and ResNet-50 (105.3), implying enhanced robustness and suitability for deployment in adverse field conditions with limited computational resources (Andrade et al., 15 Dec 2025).

Empirically, new robustness interventions are validated by demonstrating a statistically significant mCE decrease relative to the baseline, evidencing real gains in corruption resistance as opposed to clean accuracy alone (Sargolzaei et al., 21 Aug 2024).

4. Variants: Relative mCE and Their Significance

The relative mean corruption error (relative mCE) refines mCE by comparing the increase in error under corruption, relative to clean accuracy, for both the evaluated model and the baseline:

$\mathrm{RelCE}^f_c = \frac{\sum_{s=1}^{S}(E^f_{s,c} - E^f_\mathrm{clean})}{\sum_{s=1}^{S}(E^{\mathrm{ref}}_{s,c} - E^{\mathrm{ref}}_\mathrm{clean})}$

$\mathrm{Rel\,mCE}^f = \frac{1}{|C|} \sum_{c=1}^{|C|} \mathrm{RelCE}^f_c$

Relative mCE captures degradation relative to each model’s baseline clean error, enabling more nuanced differentiation between intrinsic clean-data accuracy and robustness to distribution shift (Andrade et al., 15 Dec 2025, Sargolzaei et al., 21 Aug 2024). A model with relative mCE $<100$ is less affected by corruption relative to its clean performance than the reference is.

5. Empirical Benchmarks and Model Ranking via mCE

mCE is centrally used to rank corruption robustness across model architectures and enhancement strategies. For MangoLeafDB-C (Andrade et al., 15 Dec 2025):

Architecture	mCE (ResNet-101=100)
ResNet-101	100
ResNet-50	105.3
Xception	94.5
VGG-16	63.4
LCNN	48.9

LCNN achieves the lowest mCE (48.9), demonstrating that specialized, lightweight networks can surpass larger architectures in corruption robustness despite comparable or lower clean-data accuracy. A similar trend is observed in the MNIST-C setting, where Modern Hopfield Network integration yielded a reduction in mCE from 100 to 42.51 for a vanilla CNN baseline, corresponding to a 57.49 percentage-point improvement (Sargolzaei et al., 21 Aug 2024).

6. Implications and Utility in Model Development

The mCE metric has become foundational for evaluating and driving research in model robustness. Its normalized, relative formulation enables:

Objective assessment of progress in robustness-focused architecture design and training paradigm development.
Benchmarking of defense strategies against synthetic and real-world corruptions.
Practical model selection for deployment contexts where input quality cannot be guaranteed (e.g., agriculture, medical imaging, resource-constrained edge devices).

A plausible implication is that evaluation using mCE, rather than solely clean-data accuracy, ensures new architectures maintain operational utility under realistic input perturbations. This has direct impact on model recommendations for settings such as disease diagnosis in agricultural images, where corruption is routine (Andrade et al., 15 Dec 2025).

7. Contextualization and Evolving Practices

The adoption of mCE and its variants aligns with a broader shift in computer vision toward systematic robustness assessment. The metric standardizes performance comparison by decoupling robustness from factors such as dataset idiosyncrasies or absolute accuracy, focusing instead on error amplification caused by diverse corruptions. mCE’s continued development, refinement, and application—for example, in conjunction with associative memory augmentations or test-time adaptation techniques—establishes it as a cornerstone in robustness-oriented model evaluation frameworks (Sargolzaei et al., 21 Aug 2024).