On Comparing Fair Classifiers under Data Bias (2302.05906v2)
Abstract: In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
- A reductions approach to fair classification. In International Conference on Machine Learning, pages 60–69. PMLR, 2018.
- A sandbox tool to bias (stress)-test fairness algorithms. arXiv preprint arXiv:2204.10233, 2022.
- Machine bias. ProPublica, May, 23(2016):139–159, 2016.
- Fairness in machine learning. Nips tutorial, 1:2017, 2017.
- Big data’s disparate impact. Calif. L. Rev., 104:671, 2016.
- Metric-free individual fairness in online learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias, October 2018.
- Reuben Binns. On the apparent conflict between individual and group fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 514–524. Association for Computing Machinery, 2020.
- Fairlearn: A toolkit for assessing and improving fairness in ai. Microsoft, Tech. Rep. MSR-TR-2020-32, 2020.
- Ensuring fairness under prior probability shifts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 414–424, 2021.
- Fair preprocessing: Towards understanding compositional fairness of data transformers in machine learning pipeline. arXiv preprint arXiv:2106.06054, 2021.
- Recovering from biased data: Can fairness constraints improve accuracy? arXiv preprint arXiv:1912.01094, 2019.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91. PMLR, 2018.
- Three naive bayes approaches for discrimination-free classification. Data mining and knowledge discovery, 21(2):277–292, 2010.
- Optimized pre-processing for discrimination prevention. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 3995–4004, 2017.
- Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the conference on fairness, accountability, and transparency, pages 319–328, 2019.
- The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810, 2018.
- Sample selection bias correction theory. In International conference on algorithmic learning theory, pages 38–53. Springer, 2008.
- Fair transfer learning with missing protected attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 91–98, 2019.
- Label bias, label shift: Fair machine learning with unreliable labels. In NeurIPS 2020 Workshop on Consequential Decision Making in Dynamic Environments, volume 12, 2020.
- Retiring adult: New datasets for fair machine learning. arxiv preprint arxiv:2108.04884, 2021.
- Fair and robust classification under sample selection bias. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 2999–3003, 2021.
- UCI machine learning repository, 2017.
- Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science (ITCS) Conference, page 214–226. Association for Computing Machinery, 2012.
- Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259–268, 2015.
- Fair machine learning in healthcare: A review. arXiv preprint arXiv:2206.14397, 2022.
- Will Fleisher. What’s Fair about Individual Fairness?, page 480–490. Association for Computing Machinery, 2021.
- Fairness evaluation in presence of biased noisy labels. In International Conference on Artificial Intelligence and Statistics, pages 2325–2336. PMLR, 2020.
- A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338, 2019.
- When fair classification meets noisy protected attributes. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 679–690, 2023.
- Fairness guarantees under demographic shift. In International Conference on Learning Representations, 2022.
- Implicit bias: Scientific foundations. California law review, 94(4):945–967, 2006.
- Equality of opportunity in supervised learning. Advances in neural information processing systems, 29:3315–3323, 2016.
- Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13–30, 1963.
- Through the data management lens: Experimental analysis and evaluation of fair classification. In Proceedings of the 2022 International Conference on Management of Data, pages 232–246, 2022.
- Identifying and correcting label bias in machine learning. In International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR, 2020.
- Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.
- Decision theory for discrimination-aware classification. In 2012 IEEE 12th International Conference on Data Mining, pages 924–929. IEEE, 2012.
- Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35–50. Springer, 2012.
- Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564–2572. PMLR, 2018.
- Selection problems in the presence of implicit bias. arXiv preprint arXiv:1801.03533, 2018.
- On the impossibility of fairness-aware learning from corrupted data. In Algorithmic Fairness through the Lens of Causality and Robustness workshop, pages 59–83. PMLR, 2022.
- Noise-tolerant fair classification. Advances in Neural Information Processing Systems, 32, 2019.
- Does enforcing fairness mitigate biases caused by subpopulation shift? Advances in Neural Information Processing Systems, 34, 2021.
- A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35, 2021.
- A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62:22–31, 2014.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Mitigating dataset harms requires stewardship: Lessons from 1000 papers. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55(3):1–44, 2022.
- Post-processing for individual fairness. Advances in Neural Information Processing Systems, 34, 2021.
- John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
- On fairness and calibration. arXiv preprint arXiv:1709.02012, 2017.
- Are my deep learning systems fair? an empirical study of fixed-seed training. Advances in Neural Information Processing Systems, 34, 2021.
- Robust fairness under covariate shift. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9419–9427, 2021.
- Beyond accuracy: Behavioral testing of nlp models with checklist. arXiv preprint arXiv:2005.04118, 2020.
- Maintaining fairness across distribution shift: do we have viable solutions for real-world applications? arXiv preprint arXiv:2202.01034, 2022.
- Transfer of machine learning fairness across domains. arXiv preprint arXiv:1906.09688, 2019.
- No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536, 2017.
- Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 3–13, 2021.
- Fair classification with group-dependent label noise. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 526–536, 2021.
- Predictive inequity in object detection. arXiv preprint arXiv:1902.11097, 2019.
- Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web (WWW), WWW’17, page 1171–1180, 2017.
- Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics, pages 962–970. PMLR, 2017.
- Learning fair representations. In International conference on machine learning, pages 325–333. PMLR, 2013.
- Bayes-optimal classifiers under group fairness. arXiv preprint arXiv:2202.09724, 2022.
- Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340, 2018.
- Consistent range approximation for fair predictive modeling. Proceedings of the VLDB Endowment, 16(11):2925–2938, 2023.
- The rich get richer: Disparate impact of semi-supervised learning. arXiv preprint arXiv:2110.06282, 2021.
- Indre Zliobaite. A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148, 2015.
- Mohit Sharma (46 papers)
- Amit Deshpande (35 papers)
- Rajiv Ratn Shah (108 papers)