Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Comparing Fair Classifiers under Data Bias (2302.05906v2)

Published 12 Feb 2023 in cs.LG and cs.AI

Abstract: In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. A reductions approach to fair classification. In International Conference on Machine Learning, pages 60–69. PMLR, 2018.
  2. A sandbox tool to bias (stress)-test fairness algorithms. arXiv preprint arXiv:2204.10233, 2022.
  3. Machine bias. ProPublica, May, 23(2016):139–159, 2016.
  4. Fairness in machine learning. Nips tutorial, 1:2017, 2017.
  5. Big data’s disparate impact. Calif. L. Rev., 104:671, 2016.
  6. Metric-free individual fairness in online learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  7. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias, October 2018.
  8. Reuben Binns. On the apparent conflict between individual and group fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 514–524. Association for Computing Machinery, 2020.
  9. Fairlearn: A toolkit for assessing and improving fairness in ai. Microsoft, Tech. Rep. MSR-TR-2020-32, 2020.
  10. Ensuring fairness under prior probability shifts. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 414–424, 2021.
  11. Fair preprocessing: Towards understanding compositional fairness of data transformers in machine learning pipeline. arXiv preprint arXiv:2106.06054, 2021.
  12. Recovering from biased data: Can fairness constraints improve accuracy? arXiv preprint arXiv:1912.01094, 2019.
  13. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91. PMLR, 2018.
  14. Three naive bayes approaches for discrimination-free classification. Data mining and knowledge discovery, 21(2):277–292, 2010.
  15. Optimized pre-processing for discrimination prevention. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 3995–4004, 2017.
  16. Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the conference on fairness, accountability, and transparency, pages 319–328, 2019.
  17. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810, 2018.
  18. Sample selection bias correction theory. In International conference on algorithmic learning theory, pages 38–53. Springer, 2008.
  19. Fair transfer learning with missing protected attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 91–98, 2019.
  20. Label bias, label shift: Fair machine learning with unreliable labels. In NeurIPS 2020 Workshop on Consequential Decision Making in Dynamic Environments, volume 12, 2020.
  21. Retiring adult: New datasets for fair machine learning. arxiv preprint arxiv:2108.04884, 2021.
  22. Fair and robust classification under sample selection bias. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 2999–3003, 2021.
  23. UCI machine learning repository, 2017.
  24. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science (ITCS) Conference, page 214–226. Association for Computing Machinery, 2012.
  25. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259–268, 2015.
  26. Fair machine learning in healthcare: A review. arXiv preprint arXiv:2206.14397, 2022.
  27. Will Fleisher. What’s Fair about Individual Fairness?, page 480–490. Association for Computing Machinery, 2021.
  28. Fairness evaluation in presence of biased noisy labels. In International Conference on Artificial Intelligence and Statistics, pages 2325–2336. PMLR, 2020.
  29. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338, 2019.
  30. When fair classification meets noisy protected attributes. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 679–690, 2023.
  31. Fairness guarantees under demographic shift. In International Conference on Learning Representations, 2022.
  32. Implicit bias: Scientific foundations. California law review, 94(4):945–967, 2006.
  33. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29:3315–3323, 2016.
  34. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13–30, 1963.
  35. Through the data management lens: Experimental analysis and evaluation of fair classification. In Proceedings of the 2022 International Conference on Management of Data, pages 232–246, 2022.
  36. Identifying and correcting label bias in machine learning. In International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR, 2020.
  37. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.
  38. Decision theory for discrimination-aware classification. In 2012 IEEE 12th International Conference on Data Mining, pages 924–929. IEEE, 2012.
  39. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35–50. Springer, 2012.
  40. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564–2572. PMLR, 2018.
  41. Selection problems in the presence of implicit bias. arXiv preprint arXiv:1801.03533, 2018.
  42. On the impossibility of fairness-aware learning from corrupted data. In Algorithmic Fairness through the Lens of Causality and Robustness workshop, pages 59–83. PMLR, 2022.
  43. Noise-tolerant fair classification. Advances in Neural Information Processing Systems, 32, 2019.
  44. Does enforcing fairness mitigate biases caused by subpopulation shift? Advances in Neural Information Processing Systems, 34, 2021.
  45. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35, 2021.
  46. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62:22–31, 2014.
  47. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  48. Mitigating dataset harms requires stewardship: Lessons from 1000 papers. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
  49. A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55(3):1–44, 2022.
  50. Post-processing for individual fairness. Advances in Neural Information Processing Systems, 34, 2021.
  51. John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
  52. On fairness and calibration. arXiv preprint arXiv:1709.02012, 2017.
  53. Are my deep learning systems fair? an empirical study of fixed-seed training. Advances in Neural Information Processing Systems, 34, 2021.
  54. Robust fairness under covariate shift. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9419–9427, 2021.
  55. Beyond accuracy: Behavioral testing of nlp models with checklist. arXiv preprint arXiv:2005.04118, 2020.
  56. Maintaining fairness across distribution shift: do we have viable solutions for real-world applications? arXiv preprint arXiv:2202.01034, 2022.
  57. Transfer of machine learning fairness across domains. arXiv preprint arXiv:1906.09688, 2019.
  58. No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536, 2017.
  59. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 3–13, 2021.
  60. Fair classification with group-dependent label noise. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 526–536, 2021.
  61. Predictive inequity in object detection. arXiv preprint arXiv:1902.11097, 2019.
  62. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web (WWW), WWW’17, page 1171–1180, 2017.
  63. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics, pages 962–970. PMLR, 2017.
  64. Learning fair representations. In International conference on machine learning, pages 325–333. PMLR, 2013.
  65. Bayes-optimal classifiers under group fairness. arXiv preprint arXiv:2202.09724, 2022.
  66. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340, 2018.
  67. Consistent range approximation for fair predictive modeling. Proceedings of the VLDB Endowment, 16(11):2925–2938, 2023.
  68. The rich get richer: Disparate impact of semi-supervised learning. arXiv preprint arXiv:2110.06282, 2021.
  69. Indre Zliobaite. A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mohit Sharma (46 papers)
  2. Amit Deshpande (35 papers)
  3. Rajiv Ratn Shah (108 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.