Discrimination-Accuracy Optimal Classifiers
- Discrimination-accuracy optimal classifiers are models that balance predictive performance and fairness by applying stochastic thresholding to regression functions.
- They use methods like generalized Bayes classification, Neyman–Pearson arguments, and grid search to optimize confusion-matrix metrics under fairness constraints.
- Empirical and theoretical results in high-dimensional, functional, and streaming data settings demonstrate provable tradeoffs with minimax and Pareto optimal guarantees.
Discrimination‐accuracy optimal classifiers are those that maximize predictive performance while controlling or minimizing various forms of discrimination, usually codified either through confusion‐matrix metrics, fairness constraints, or explicit group disparity measures. This article presents the core mathematical principles, algorithmic constructions, and theoretical guarantees for designing classifiers that achieve optimal tradeoffs between discrimination and accuracy, with coverage spanning deterministic and stochastic rules, high‐dimensional and functional regimes, and finite‐sample and population settings.
1. Generalized Bayes Classification for Confusion-Matrix Metrics
Classic binary classification targets maximization of accuracy, achieved by thresholding the regression function at $1/2$, yielding the Bayes-optimal 0–1 loss minimizer. However, most practical scenarios—especially those involving class imbalance or alternative performance metrics—require maximizing an arbitrary monotonic confusion-matrix measure (CMM), (Singh et al., 2021). The optimal classifier in this regime is a regression-thresholding classifier (RTC) defined via stochastic thresholding:
for some threshold and randomization . A conditional expectation and Neyman–Pearson–type variational argument establish that maximization of any CMM admits a Bayes-optimal RTC form. The threshold solves
with chosen to exactly balance marginal rates. Stochasticity at threshold atoms can strictly outperform any deterministic rule, e.g., for over , a randomized rule yields nonzero where deterministic rules are strictly worse.
2. Discrimination–Accuracy Tradeoff Formalism
The discrimination–accuracy frontier quantifies the maximal predictive performance attainable under fairness constraints. Formally, one seeks
where discrimination can be raw group difference or normalized by its maximal achievable value for given positive rates (Zliobaite, 2015). The normalized pair , with Cohen's kappa and normalized discrimination, properly isolates true predictive and fairness abilities from artifacts of positive rate :
An oracle that knows true labels achieves Pareto-optimal tradeoff curves that are piecewise-linear in , with loss in proportional to reduction in .
3. Optimization under Disparity Constraints and Fairness Measures
Bayes‐optimal classification under explicit fairness criteria proceeds by formulating the risk as minimization of expected misclassification under linear or bilinear disparity constraints (Zeng et al., 5 Feb 2024, Zeng et al., 2022). With regression scores and linear fairness measure , the constrained Bayes-optimal rule via Neyman–Pearson lemma is:
where the threshold explicitly depends on the fairness weights. For classical metrics like demographic parity or equality of opportunity, the optimal threshold shifts are group-specific and determined by the structure of , which for bilinear measures are affine in . With multiple constraints (e.g., equalized odds), the optimal rule is a group-wise threshold on a linear combination of fairness weights.
Practical implementations are effected by (a) up/down-sampling the training set to mimic the optimal marginal distributions, (b) cost-sensitive classification with optimal rates, or (c) postprocessing plug-in rules—all requiring only one-dimensional search for the tradeoff parameter (Zeng et al., 5 Feb 2024, Zeng et al., 2022).
4. Algorithmic and Statistical Guarantees in High-Dimensional and Functional Regimes
Discrimination-accuracy optimality is established in several settings through sharp minimax excess risk bounds and explicit constructions:
- Functional Data: The minimax excess risk for Gaussian-process data with smoothness parameter admits optimal rate ; both functional QDA and deep ReLU networks achieve this bound, even under discrete sampling with critical frequency governing sampling error (Wang et al., 2021).
- High-Dimensional Linear Discriminant Analysis: GO-LDA provides sequentially Fisher-optimal, mutually orthogonal discriminant directions, avoiding the standard rank limit of multiclass LDA and maintaining high discriminative power throughout the subspace. The empirical performance shows robust gains over classic LDA and PCA (Liu et al., 2023).
- Centroid Classification: Scale-adjusted centroid rules in regimes remove confounding from scale differences, attain minimax boundary , and outperform nearest neighbor and SVM under only mild dependence and finite-moment conditions (Hall et al., 2010).
- Latent Factor Models: Projected PCA-based classifiers followed by discriminant regression yield excess risk matching minimax lower bounds in settings (Bing et al., 2022).
5. Fairness-aware Multi-objective Optimization in Data Streams
Evolutionary multi-objective optimization frameworks directly compute the Pareto front for classification error and discrimination, e.g., via weighted feature selection in self-adjusting memory k-NN (EMOSAM) (Amarasinghe et al., 18 Apr 2024). The search space is the simplex of feature weights; the optimization tracks discrimination and accuracy on sliding windows, triggering re-optimization on concept drift via HP-filter decomposition. Empirical results on standard fairness datasets show the method attains Pareto-optimality: in nearly all benchmarks, EMOSAM yields both lowest discrimination and competitive accuracy, explicitly visualizing operable points along the tradeoff frontier. The approach facilitates practitioner selection of optimal according to operational requirements.
| Paper/Method | Accuracy criterion | Discrimination constraint | Optimization technique/guarantee |
|---|---|---|---|
| Generalized Bayes (RTC) | Confusion-matrix M | N/A | Neyman–Pearson, stochastic thresholding (Singh et al., 2021) |
| FairBayes | Zero–one | Linear disparity | Group-wise threshold via NP lemma (Zeng et al., 2022, Zeng et al., 5 Feb 2024) |
| Scale-adjusted centroid | L2 mean difference | N/A | Minimax optimality for (Hall et al., 2010) |
| GO-LDA | Fisher criterion | N/A | Sequential generalized eigenproblem (Liu et al., 2023) |
| PCLDA | Latent factor | N/A | Minimally projected excess risk (Bing et al., 2022) |
| EMOSAM | Accuracy + parity | Statistical parity diff | SMPSO, Pareto archive (Amarasinghe et al., 18 Apr 2024) |
6. Practical Algorithms, Finite-Sample Rates, and Empirical Validation
Finite-sample regret and error rates for discrimination-accuracy optimal classifiers are precisely characterized:
- For generic CMMs, plug-in RTC classifiers using k-NN/regressors with sufficiently large yield regret of order , with explicit rates established under Hölder smoothness, and Uniform Class Imbalance sharply controlling estimation error (Singh et al., 2021).
- For fairness-constrained classifiers, bisection or grid search over single threshold parameters achieves the desired tradeoff with low computational overhead, and empirical disparity/accuracy curves match population frontiers when underlying regression scores are closely estimated (Zeng et al., 5 Feb 2024, Zeng et al., 2022).
- In model-based Bayes settings, constrained maximum-likelihood estimation under mined pattern-based discrimination constraints enables nearly optimal accuracy with exponential elimination of offending discrimination patterns and minimal accuracy loss (Choi et al., 2019).
Empirical validation across domains—speech recognition, medical imaging, gene expression—shows that discrimination-accuracy optimal classifiers match or surpass traditional methods and robustly implement prescribed tradeoff curves in practice.
7. Conceptual and Theoretical Implications
The unifying theme is that optimal classifiers for discrimination and accuracy must be constructed via thresholding of the underlying regression function, with threshold shifts and potential stochasticity dictated by the metric to be optimized and fairness constraints imposed. Piecewise-linear and sometimes stochastic Pareto frontiers emerge, reflecting the feasibility and cost of fairness in classification. Closed-form characterization of optimal tradeoff curves, tight minimax bounds, and implementable algorithms are now available for a broad array of metrics and operational constraints.
This framework advances the methodological rigor of fairness-aware classification and discrimination-optimal learning, providing both the statistical foundation and practical tools required for robust, accountable deployment in high-stakes decision applications [(Singh et al., 2021); (Zeng et al., 2022); (Zeng et al., 5 Feb 2024); (Amarasinghe et al., 18 Apr 2024); (Zliobaite, 2015); (Hall et al., 2010); (Liu et al., 2023); (Bing et al., 2022); (Wang et al., 2021); (Choi et al., 2019); (Nokleby et al., 2014)].