Recovering from Biased Data: Can Fairness Constraints Improve Accuracy? (1912.01094v2)

Published 2 Dec 2019 in cs.LG, cs.AI, and stat.ML

Abstract: Multiple fairness constraints have been proposed in the literature, motivated by a range of concerns about how demographic groups might be treated unfairly by machine learning classifiers. In this work we consider a different motivation; learning from biased training data. We posit several ways in which training data may be biased, including having a more noisy or negatively biased labeling process on members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint (Hardt, Price, and Srebro 2016) combined with ERM will provably recover the Bayes Optimal Classifier under a range of bias models. We also consider other recovery methods including reweighting the training data, Equalized Odds, and Demographic Parity. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy.

Authors (2)

Avrim Blum (70 papers)
Kevin Stangl (11 papers)

Citations (82)

View on Semantic Scholar

Summary

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

The paper "Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?" explores the intersection of fairness constraints and classifier accuracy in machine learning, particularly in the presence of biased training data. The authors present a compelling argument that fairness constraints, often introduced to mitigate demographic biases, can also enhance the accuracy of classifiers under conditions of biases in the training datasets. The paper offers a detailed theoretical analysis on how fairness interventions, specifically the Equal Opportunity constraint, contribute to recovering the Bayes Optimal Classifier amid various bias models.

Core Findings

In the pursuit of classifier accuracy, machine learning practitioners typically rely on Empirical Risk Minimization (ERM), which seeks to minimize error based on training data. This approach may lead to biased classifiers if the training data itself is biased, as the optimal classifier learned through ERM could diverge from the true data distribution. The authors introduce several bias models—namely, Under-Representation Bias and Labeling Bias—which depict common scenarios where training datasets fail to capture the true distribution of features and labels due to systemic bias affecting disadvantaged groups.

The significant contribution of the paper is the demonstration that fairness constraints can correct for biases. The authors show that applying the Equal Opportunity constraint to the ERM can recover the Bayes Optimal Classifier across various bias scenarios. They prove that, under particular conditions related to the prevalence of each group's data and the level of label noise, Equal Opportunity facilitates recovery from biased data. Specifically, the authors establish bounds on the parameters such as the fraction of data from the disadvantaged group $r$ , the error rate $\eta$ , and bias parameters $\beta$ and $\nu$ . These inequalities define the conditions under which fairness-constrained ERM will not fall into error scenarios induced by biased data.

Comparisons and Implications

The paper conducts a comparative analysis between Equal Opportunity and other fairness notions, such as Equalized Odds and Demographic Parity. While Equal Opportunity shows robustness across different bias models in recovering the accurate classifier, Equalized Odds and Demographic Parity fail in certain bias regimes. These shortcomings prompt further consideration of how fairness criteria relate to bias models and their implications on learning theory.

Practically, these findings suggest that fairness constraints are not solely a normative concern to ensure equitable treatment of demographic groups. They become crucial tools in addressing potential prediction inaccuracies stemming from biased sources. The authors suggest that fairness interventions could serve as a safeguard for achieving higher accuracy, implicitly making broader implications for fairness research in AI.

Future Directions

While this framework provides a rigorous foundation for understanding and utilizing fairness constraints in biased environments, future research could explore computational aspects such as algorithmic efficiency in implementing fairness-constrained ERM. Moreover, empirical validation in real-world datasets, particularly those beyond theoretical formulations, would strengthen the practical applications of these results. Mobility in fairness criteria across different domains and datasets also requires further exploration, especially in settings where data from Group A and Group B differ significantly in complexity.

Overall, the paper expands the corridors of AI research by positing fairness not only as a modern ethical demand but also as a practical mechanism potentially crucial for recovering accuracy from biased data.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/mohitsharmaj29/status/1791831309343760875