Class-Normalized Average Accuracy

Updated 8 November 2025

Class-Normalized Average Accuracy is a metric that computes the mean per-class recall to ensure every class is weighted equally despite imbalances.
It normalizes the confusion matrix rows into probabilities, thereby mitigating the dominance of majority classes in performance evaluation.
The metric integrates with ROC analysis and extensions like NormAcc to provide nuanced insights in high-risk, imbalanced classification scenarios.

Class-Normalized Average Accuracy (CA), frequently referred to as macro-accuracy or balanced accuracy depending on context, is a widely adopted performance measure for evaluating classifiers in the presence of class imbalance. Unlike standard accuracy, which can be biased toward majority classes when test set class frequencies are highly skewed, class-normalized average accuracy quantifies performance such that each class contributes equally, regardless of its frequency in the dataset. This approach provides a more faithful estimate of a model's ability to handle each class, making it particularly valuable in fields where minority-class performance is critical, such as medical diagnosis, fraud detection, and rare event modeling.

1. Mathematical Definition and Computation

For a $C$ -class classification problem, the confusion matrix $M \in \mathbb{R}^{C \times C}$ consists of entries $M_{ij} = |\{x : Y(x) = i \wedge \hat{Y}(x) = j\}|$ , where $Y(x)$ is the true label and $\hat{Y}(x)$ the predicted label of instance $x$ . Class-normalized average accuracy is defined as the average of per-class recall values, after normalizing each row of the confusion matrix to account for varying class frequencies (Erbani et al., 5 Sep 2025):

$M^{\text{row}}_{i,j} = \frac{M_{ij}}{\sum_{k=1}^C M_{ik}} = \frac{M_{ij}}{M_{i+}}$

$\mathrm{CA} = \frac{1}{C} \sum_{i=1}^C M^{\text{row}}_{i,i} = \frac{1}{C} \sum_{i=1}^C \frac{M_{ii}}{M_{i+}}$

Equivalently, $CA$ is the mean per-class recall, treating each class equally, regardless of its sample size in the data. In probabilistic terms, $CA = \frac{1}{C} \sum_{i} P(\hat{Y} = i \mid Y = i)$ .

A table for reference:

Symbol	Description
$M_{ij}$	Count: true class $i$ , predicted as $j$
$M_{i+}$	Row sum: total instances in class $i$
$CA$	Class-normalized average (macro) accuracy

2. Relation to Balanced Accuracy and Alternative Metrics

In the binary setting, class-normalized average accuracy reduces to balanced accuracy (BACC):

$\mathrm{BACC} = \frac{1}{2} (\mathrm{Recall} + \mathrm{Selectivity})$

where:

$\mathrm{Recall} = \frac{TP}{TP + FN}$ is the true positive rate,
$\mathrm{Selectivity} = \frac{TN}{TN + FP}$ is the true negative rate.

This form can be interpreted as averaging the accuracies of positive and negative classes, achieving class neutrality. For general multiclass settings, $CA$ is the natural generalization, as it remains the arithmetic mean of per-class recalls (Burduk, 2020).

The harmonic mean metric HMNC, as introduced in (Burduk, 2020), provides an alternative:

$\mathrm{HMNC} = \frac{TP\,TN\,M}{(TP + TN)\,P\,N}$

It is noteworthy that all these metrics coincide under the condition $\frac{TP}{P} = \frac{TN}{N}$ : $\mathrm{HMNC} = ACC = BACC = G\text{-mean} = \frac{TP}{P} = \frac{TN}{N}$ .

3. Normalization Against Distribution Bias

Raw confusion matrices are dominated by the frequencies of each class in the test set; majority classes exert undue influence, even for a fair classifier. Row normalization—dividing each element in a row by the total of that row—removes this source of bias, converting counts into conditional probabilities $P(\hat{Y} = j \mid Y = i)$ and ensuring each class contributes equally to overall metric computation (Erbani et al., 5 Sep 2025).

This normalization is mathematically the I-projection of the empirical confusion matrix onto the space of row-stochastic matrices using KL-divergence, ensuring minimal information loss relative to the original matrix.

4. Theoretical Implications and Geometric Interpretation

Class-normalized average accuracy also admits a geometric interpretation in the latent space of model representations. Consider clusters $C_i = \{\varepsilon(x): Y(x)=i\}$ formed in latent space for each class. The row-normalized confusion matrix $M^{\text{row}}$ can be approximated by the intersection of unit-volume histograms for the true (label) and predicted (output) class clusters:

$(M_{c,1}^{GCM})_{ij} = \text{Vol} [H(C_i; c,1) \cap H(\hat{C}_j; c,1)]$

This geometric perspective links the abstract normalization process to the overlaps between probability masses in the embedded feature space, revealing both class confusion and representation locality (Erbani et al., 5 Sep 2025).

5. Relationship to ROC Analysis and AUC

Class-normalized average accuracy extends naturally to threshold-averaged variants through ROC analysis. In binary classification, Carrington et al. show that the area under the ROC curve (AUC) is simply the threshold-averaged balanced (class-normalized) average accuracy (Carrington et al., 2021):

$\mathrm{AUC} = \int_0^1 \mathrm{BAA}(\tau)\, d\tau$

$\mathrm{BAA}(t) = \pi \cdot \mathrm{TPR}(t) + (1-\pi) \cdot \mathrm{TNR}(t)$

where $\pi$ is the class prevalence. Partial AUCs correspond to specific class-normalized average sensitivities or specificities within user-defined FPR or TPR intervals, supporting more granular model evaluation in high-risk or critical zones.

Viewing AUC as an average of class-normalized accuracies not only clarifies the meaning of this widely used metric but also enables local ROC or risk-segment analyses ("deep ROC analysis"), which can reveal region-specific strengths and weaknesses that global metrics may obscure (Carrington et al., 2021).

6. Extensions: Dataset-Adaptive and Penalized Metrics

Recent advances propose integrating class-normalized average accuracy with explicit penalties for class imbalance and dataset characteristics (Ossenov, 10 Dec 2024). One such "dataset-adaptive normalized metric" (NormAcc) is:

$\mathrm{NormAcc} = \min\!\Biggl(1,\, \mathrm{Acc}_\text{raw} \cdot f(d,N)\cdot g(\mathrm{SNR}) \Big/ h(\mathrm{ACIR}) \Biggr)$

where $h(\mathrm{ACIR}) = 1 + \log(\mathrm{ACIR})$ , and $\mathrm{ACIR} = \frac{1}{C}\sum_{i=1}^C \frac{N_{\mathrm{maj}}}{N_i}$ captures class imbalance. This normalization ensures that even high raw accuracy is appropriately penalized if the majority class dominates.

An explicit calculation illustrates the impact: for three classes with respective sizes (50, 30, 20) and correct predictions (45, 18, 16), overall accuracy is 0.79. However, because the average class-imbalance ratio $h(\mathrm{ACIR})\approx1.544$ , the normalized accuracy drops to $\sim0.51$ , providing a far sterner assessment in the presence of imbalance (Ossenov, 10 Dec 2024).

7. Practical Implications, Use Cases, and Limitations

Class-normalized average accuracy is a principled choice for imbalanced classification problems, offering resilience against majority-class domination and a direct, interpretable assessment of model performance per class. It is straightforward to compute from the confusion matrix, and—via its links to threshold-averaged metrics—naturally supports both global and local model evaluation (through AUC and deep ROC analysis).

However, CA, like any metric, is not universally optimal; it may be less suitable where the application demands class-weighted performance or when domain-specific misclassification costs vary substantially. Alternative metrics such as the harmonic mean normalized class metric (HMNC) provide differential sensitivity to changes across the range of imbalance ratios, allowing practitioners to tailor metric choice to application needs (Burduk, 2020).

Class-normalized average accuracy has become an essential tool in the rigorous evaluation of classifiers under class-imbalanced regimes, enabling fairer comparisons and highlighting where models still fall short from a minority-class perspective.