Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions (2301.11781v3)

Published 27 Jan 2023 in cs.LG, cs.CY, cs.IT, math.IT, and stat.ML

Abstract: Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. A reductions approach to fair classification. In International Conference on Machine Learning, pages 60–69. PMLR.
  2. Model projection: Theory and applications to fair machine learning. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 2711–2716. IEEE.
  3. Beyond Adult and COMPAS: Fair multi-class prediction via information projection. In Advances in Neural Information Processing Systems.
  4. Machine bias. ProPublica.
  5. UCI Machine Learning Repository.
  6. Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943.
  7. Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5):4–1.
  8. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1):3–44.
  9. Blackwell, D. (1951). Comparison of experiments. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, pages 93–102.
  10. Blackwell, D. (1953). Equivalent comparisons of experiments. The annals of mathematical statistics, pages 265–272.
  11. Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050.
  12. Recovering from biased data: Can fairness constraints improve accuracy? arXiv preprint arXiv:1912.01094.
  13. Optimized pre-processing for discrimination prevention. Advances in neural information processing systems, 30.
  14. Cam, L. L. (1964). Sufficiency and approximate sufficiency. The Annals of Mathematical Statistics, pages 1419–1455.
  15. Impact of imputation strategies on fairness in machine learning. Journal of Artificial Intelligence Research, 74:1011–1035.
  16. Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the conference on fairness, accountability, and transparency, pages 319–328.
  17. Why is my classifier discriminatory? Advances in neural information processing systems, 31.
  18. Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163.
  19. Leveraging labeled and unlabeled data for consistent fair binary classification. Advances in Neural Information Processing Systems, 32.
  20. Comparisons of stochastic matrices with applications in information theory, statistics, economics and population. Springer Science & Business Media.
  21. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pages 797–806.
  22. Retiring adult: New datasets for fair machine learning. Advances in neural information processing systems, 34:6478–6490.
  23. Is there a trade-off between fairness and accuracy? a perspective using mismatched hypothesis testing. In International Conference on Machine Learning, pages 2803–2813. PMLR.
  24. Decoupled classifiers for group-fair and efficient machine learning. In Conference on fairness, accountability and transparency, pages 119–133. PMLR.
  25. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 259–268.
  26. Missing the missing values: The ugly duckling of fairness in machine learning. International Journal of Intelligent Systems, 36(7):3217–3258.
  27. Fairness evaluation in presence of biased noisy labels. In International Conference on Artificial Intelligence and Statistics, pages 2325–2336. PMLR.
  28. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 329–338.
  29. Multicalibrated regression for downstream fairness. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 259–286.
  30. Omnipredictors. arXiv preprint arXiv:2109.05389.
  31. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, volume 29.
  32. Dc programming: overview. Journal of Optimization Theory and Applications, 103(1):1–43.
  33. Bia mitigation for machine learning classifiers: A comprehensive survey. arXiv preprint arXiv:2207.07068.
  34. Omnipredictors for constrained optimization. In International Conference on Machine Learning, pages 13497–13527. PMLR.
  35. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110(3):457–506.
  36. High school longitudinal study of 2009 (hsls: 09): Base-year data file documentation. nces 2011-328. National Center for Education Statistics.
  37. Measurement and fairness. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 375–385.
  38. Fairness without imputation: A decision tree approach for fair prediction with missing values. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9558–9566.
  39. Identifying and correcting label bias in machine learning. In International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR.
  40. Wasserstein fair classification. In Uncertainty in Artificial Intelligence, pages 862–872. PMLR.
  41. Assessing algorithmic fairness with unobserved protected class using data combination. Management Science, 68(3):1959–1981.
  42. Data preprocessing techniques for classification without discrimination. Knowledge and information systems, 33(1):1–33.
  43. Taxonomizing and measuring representational harms: A look at image tagging. arXiv preprint arXiv:2305.01776.
  44. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International Conference on Machine Learning, pages 2564–2572. PMLR.
  45. Fact: A diagnostic for group fairness trade-offs. In International Conference on Machine Learning, pages 5264–5274. PMLR.
  46. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 247–254.
  47. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807.
  48. Fermi: Fair empirical risk minimization via exponential Rényi mutual information. arXiv preprint arXiv:2102.12586.
  49. Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning, pages 6755–6764. PMLR.
  50. Mayson, S. G. (2019). Bias in, bias out. The Yale Law Journal, 128(8):2218–2300.
  51. Mitigating bias in set selection with noisy protected attributes. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 237–248.
  52. The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, pages 107–118. PMLR.
  53. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  54. On fairness and calibration. Advances in neural information processing systems, 30.
  55. Raginsky, M. (2011). Shannon meets blackwell and le cam: Channels, codes, and statistical experiments. In 2011 IEEE International Symposium on Information Theory Proceedings, pages 1220–1224. IEEE.
  56. Coarse-graining and the blackwell order. Entropy, 19(10):527.
  57. Fairprep: Promoting data to a first-class citizen in studies on fairness-enhancing interventions. arXiv preprint arXiv:1911.12587.
  58. Shannon, C. E. (1958). A note on a partial ordering for communication channels. Information and control, 1(4):390–397.
  59. Disciplined convex-concave programming. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 1009–1014. IEEE.
  60. On the discrimination risk of mean aggregation feature imputation in graphs. Advances in Neural Information Processing Systems.
  61. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002, 2:8.
  62. Fairness for unobserved characteristics: Insights from technological impacts on queer communities. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 254–265.
  63. Torgersen, E. (1991). Comparison of statistical experiments, volume 36. Cambridge University Press.
  64. Fairness without harm: Decoupled classifiers with preference guarantees. In International Conference on Machine Learning, pages 6373–6382. PMLR.
  65. Varshney, K. R. (2021). Trustworthy machine learning. Chappaqua, NY.
  66. Fairness definitions explained. In 2018 ieee/acm international workshop on software fairness (fairware), pages 1–7. IEEE.
  67. To split or not to split: The impact of disparate treatment in classification. IEEE Transactions on Information Theory, 67(10):6733–6757.
  68. Robust optimization for fairness with noisy protected groups. Advances in Neural Information Processing Systems, 33:5190–5203.
  69. Analyzing the impact of missing values and selection bias on fairness. International Journal of Data Science and Analytics, 12(2):101–119.
  70. Optimized score transformation for consistent fair classification. J. Mach. Learn. Res., 22:258–1.
  71. Unlocking fairness: a trade-off revisited. Advances in neural information processing systems, 32.
  72. Fairness with overlapping groups; a probabilistic perspective. In Advances in Neural Information Processing Systems, volume 33, pages 4067–4078.
  73. Fairness constraints: A flexible approach for fair classification. The Journal of Machine Learning Research, 20(1):2737–2778.
  74. Learning fair representations. In International conference on machine learning, pages 325–333. PMLR.
  75. Bayes-optimal classifiers under group fairness. arXiv preprint arXiv:2202.09724.
  76. Fair Bayes-optimal classifiers under predictive parity. In Advances in Neural Information Processing Systems.
  77. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340.
  78. Assessing fairness in the presence of missing data. Advances in neural information processing systems, 34:16007–16019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hao Wang (1119 papers)
  2. Luxi He (9 papers)
  3. Rui Gao (72 papers)
  4. Flavio P. Calmon (56 papers)
Citations (9)