When Fair Classification Meets Noisy Protected Attributes (2307.03306v2)
Abstract: The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.
- 2014. Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment. Consumer Financial Protection Bureau. https://files.consumerfinance.gov/f/201409_cfpb_report_proxy-methodology.pdf
- A reductions approach to fair classification. In International Conference on Machine Learning. PMLR, 60–69.
- Fair regression: Quantitative definitions and reduction-based algorithms. In International Conference on Machine Learning. PMLR, 120–129.
- Learning optimal and fair decision trees for non-discriminative decision-making. In Proceedings of the AAAI Conference on Artificial Intelligence. 1418–1426.
- What We Can’t Measure, We Can’t Understand: Challenges to Demographic Data Procurement in the Pursuit of Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 249–260.
- Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
- Effectiveness of equalized odds for fair classification under imperfect group information. arXiv preprint arXiv:1906.03284 (2019).
- Measuring discrepancies in Airbnb guest acceptance rates using anonymized demographic data. AirBNB. https://news.airbnb.com/wp-content/uploads/sites/4/2020/06/Project-Lighthouse-Airbnb-2020-06-12.pdf.
- AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. https://doi.org/10.48550/ARXIV.1810.01943
- Awareness in practice: tensions in access to sensitive attribute data for antidiscrimination. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 492–500.
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77–91.
- Optimized pre-processing for discrimination prevention. Advances in neural information processing systems 30 (2017).
- Fair classification with noisy protected attributes: A framework with provable guarantees. In International Conference on Machine Learning. PMLR, 1349–1361.
- Jeffrey Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. In Ethics of Data and Analytics. Auerbach Publications, 296–299.
- Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics 10 (2022), 92–110.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
- Retiring adult: New datasets for fair machine learning. Advances in neural information processing systems 34 (2021), 6478–6490.
- Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.
- EY. 2020. Assessing and mitigating unfairness in credit models with Fairlearn. https://www.ey.com/en_ca/financial-services/assessing-and-mitigating-unfairness-in-credit-models. [Accessed: March 16th, 2023].
- Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259–268.
- Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center Research Publication 2020-1 (2020).
- A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the conference on fairness, accountability, and transparency. 329–338.
- When fair ranking meets uncertain inference. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1033–1043.
- Subverting Fair Image Search with Generative Adversarial Perturbations. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 637–650. https://doi.org/10.1145/3531146.3533128
- Faircanary: Rapid continuous explainable fairness. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 307–316.
- Non-discriminatory machine learning through convex fairness criteria. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Equality of opportunity in supervised learning. arXiv preprint arXiv:1610.02413 (2016).
- Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning. PMLR, 1929–1938.
- Fairea: A Model Behaviour Mutation Approach to Benchmarking Bias Mitigation Methods. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
- IBM. 2022. AI Ethics: IBM’s multidisciplinary, multidimensional approach to trustworthy AI. https://www.ibm.com/artificial-intelligence/ethics.
- Assessing algorithmic fairness with unobserved protected class using data combination. Management Science 68, 3 (2022), 1959–1981.
- Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems 33, 1 (2012), 1–33.
- Decision theory for discrimination-aware classification. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 924–929.
- Fairness-aware classifier with prejudice remover regularizer. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 35–50.
- Kimmo Kärkkäinen and Jungseock Joo. 2019. Fairface: Face attribute dataset for balanced race, gender, and age. arXiv preprint arXiv:1908.04913 (2019).
- Andrey Kolmogorov. 1933. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn. 4 (1933), 83–91.
- Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In Proceedings of the 2018 world wide web conference. 853–862.
- Problems with Shapley-value-based explanations as feature importance measures. In International Conference on Machine Learning. PMLR, 5491–5500.
- Fairness without demographics through adversarially reweighted learning. Advances in neural information processing systems 33 (2020), 728–740.
- Noise-tolerant fair classification. Advances in neural information processing systems 32 (2019).
- Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
- LinkedIn. 2021. LiFT: A Scalable Framework for Measuring Fairness in ML Applications. https://github.com/linkedin/LiFT. [Accessed: March 16th, 2023].
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
- Detecting race and gender bias in visual representation of AI on web search engines. In International Workshop on Algorithmic Bias in Search and Recommendation. Springer, 36–50.
- Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60.
- A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).
- Microsoft. 2022. Microsoft Responsible AI Standard, v2. https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4ZPmV.
- Fair learning with private demographic data. In International Conference on Machine Learning. PMLR, 7066–7075.
- OECD. 2022. OECD AI Principles overview. https://oecd.ai/en/ai-principles.
- On fairness and calibration. Advances in neural information processing systems 30 (2017).
- " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
- Fairlearn: A toolkit for assessing and improving fairness in AI. Proceedings of Machine Learning Research 120 (2020), 1–8. https://doi.org/10.5555/3396126.3396130
- Nikolai V Smirnov. 1939. On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bull. Math. Univ. Moscou 2, 2 (1939), 3–14.
- Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international 2014 (2014).
- Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319–3328.
- The White House. 2022. Blueprint for an AI Bill of Rights: Making Automated Systems work for the American People. https://www.vox.com/recode/22455140/lemonade-insurance-ai-twitter.
- UNESCO. 2022. Draft text of the Recommendation on the Ethics of Artificial Intelligence. https://unesdoc.unesco.org/ark:/48223/pf0000377897.
- Fairness without harm: Decoupled classifiers with preference guarantees. In International Conference on Machine Learning. PMLR, 6373–6382.
- Cédric Villani. 2009. The wasserstein distances. In Optimal transport. Springer, 93–111.
- Robust optimization for fairness with noisy protected groups. Advances in neural information processing systems 33 (2020), 5190–5203.
- Linda F Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).
- Building and auditing fair algorithms: A case study in candidate screening. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 666–677.
- Fairness constraints: Mechanisms for fair classification. In Artificial intelligence and statistics. PMLR, 962–970.
- Learning fair representations. In International conference on machine learning. PMLR, 325–333.
- Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 335–340.