FairGridSearch: A Framework to Compare Fairness-Enhancing Models (2401.02183v1)
Abstract: Machine learning models are increasingly used in critical decision-making applications. However, these models are susceptible to replicating or even amplifying bias present in real-world data. While there are various bias mitigation methods and base estimators in the literature, selecting the optimal model for a specific application remains challenging. This paper focuses on binary classification and proposes FairGridSearch, a novel framework for comparing fairness-enhancing models. FairGridSearch enables experimentation with different model parameter combinations and recommends the best one. The study applies FairGridSearch to three popular datasets (Adult, COMPAS, and German Credit) and analyzes the impacts of metric selection, base estimator choice, and classification threshold on model fairness. The results highlight the significance of selecting appropriate accuracy and fairness metrics for model evaluation. Additionally, different base estimators and classification threshold values affect the effectiveness of bias mitigation methods and fairness stability respectively, but the effects are not consistent across all datasets. Based on these findings, future research on fairness in machine learning should consider a broader range of factors when building fair models, going beyond bias mitigation methods alone.
- J. Zhao, Y. Zhou et al., “Learning gender-neutral word embeddings,” Brussels, Belgium, pp. 4847–4853, Oct.-Nov. 2018. [Online]. Available: https://aclanthology.org/D18-1521
- M. Buyl, C. Cociancig et al., “Tackling Algorithmic Disability Discrimination in the Hiring Process: An Ethical, Legal and Technical Analysis,” Seoul Republic of Korea, pp. 1071–1082, Jun. 2022.
- J. Angwin, J. Larson et al., “Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks.” https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, 2016.
- S. Tolan, M. Miron et al., “Why Machine Learning May Lead to Unfairness: Evidence from Risk Assessment for Juvenile Justice in Catalonia,” Montreal QC Canada, pp. 83–92, Jun. 2019.
- N. Kozodoi, J. Jacob, and S. Lessmann, “Fairness in Credit Scoring: Assessment, Implementation and Profit Implications,” European Journal of Operational Research, vol. 297, no. 3, pp. 1083–1094, Mar. 2022.
- I. E. Kumar, K. E. Hines, and J. P. Dickerson, “Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with U.S. Fair Lending Regulation,” Oxford, UK, pp. 357–368, Jul. 2022.
- T. Bolukbasi, K.-W. Chang et al., “Man is to computer programmer as woman is to homemaker? debiasing word embeddings,” Red Hook, NY, USA, p. 4356–4364, 2016.
- J. Zhao, T. Wang et al., “Men also like shopping: Reducing gender bias amplification using corpus-level constraints,” Copenhagen, Denmark, pp. 2979–2989, Sep. 2017. [Online]. Available: https://aclanthology.org/D17-1323
- J. R. Foulds, R. Islam et al., “An intersectional definition of fairness,” pp. 1918–1921, 2020.
- M. Hall, L. van der Maaten et al., “A systematic study of bias amplification,” ArXiv, vol. abs/2201.11706, 2022.
- K. T. Hufthammer, T. H. Aasheim et al., “Bias mitigation with AIF360: A comparative study,” Norsk IKT-konferanse for forskning og utdanning, no. 1, Nov. 2020.
- M. Hort, J. M. Zhang et al., “Fairea: A model behaviour mutation approach to benchmarking bias mitigation methods,” New York, NY, USA, pp. 994–1006, Aug. 2021.
- Z. Chen, J. Zhang et al., “A comprehensive empirical study of bias mitigation methods for software fairness,” ArXiv, vol. abs/2207.03277, 2022.
- J. A. Adebayo, “FairML : ToolBox for diagnosing bias in predictive modeling,” Thesis, Massachusetts Institute of Technology, 2016.
- E. Hamilton, “Benchmarking four approaches to fairness-aware machine learning,” Ph.D. dissertation, 2017.
- D. Roth, “A comparison of fairness-aware machine learning algorithms,” Ph.D. dissertation, 2018.
- S. A. Friedler, C. Scheidegger et al., “A comparative study of fairness-enhancing interventions in machine learning,” pp. 329–338, 2019.
- S. Biswas and H. Rajan, “Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness,” New York, NY, USA, pp. 642–653, Nov. 2020.
- J. Chakraborty, S. Majumder et al., “Fairway: A way to build fair ML software,” New York, NY, USA, pp. 654–665, Nov. 2020.
- M. Dabra, H. Poonawala et al., “Tune ML models for additional objectives like fairness with SageMaker Automatic Model Tuning | AWS Machine Learning Blog,” https://aws.amazon.com/blogs/machine-learning/tune-ml-models-for-additional-objectives-like-fairness-with-sagemaker-automatic-model-tuning/, Feb. 2023.
- D. Pessach and E. Shmueli, “A Review on Fairness in Machine Learning,” ACM Computing Surveys, vol. 55, no. 3, pp. 51:1–51:44, Feb. 2022.
- R. Zemel, Y. Wu et al., “Learning Fair Representations,” pp. 325–333, May 2013.
- C. Dwork, M. Hardt et al., “Fairness through awareness,” New York, NY, USA, pp. 214–226, Jan. 2012.
- S. Caton and C. Haas, “Fairness in machine learning: A survey,” arXiv preprint arXiv:2010.04053, 2020.
- N. Mehrabi, F. Morstatter et al., “A survey on bias and fairness in machine learning,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021.
- R. Berk, H. Heidari et al., “Fairness in Criminal Justice Risk Assessments: The State of the Art,” Sociological Methods & Research, vol. 50, no. 1, pp. 3–44, Feb. 2021.
- J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores,” Nov. 2016.
- G. Pleiss, M. Raghavan et al., “On fairness and calibration,” Advances in neural information processing systems, vol. 30, 2017.
- P. Saleiro, B. Kuester et al., “Aequitas: A bias and fairness audit toolkit,” arXiv preprint arXiv:1811.05577, 2018.
- M. Hort, Z. Chen et al., “Bias mitigation for machine learning classifiers: A comprehensive survey,” arXiv preprint arXiv:2207.07068, 2022.
- B. Ghai, M. Mishra, and K. Mueller, “Cascaded debiasing: Studying the cumulative effect of multiple fairness-enhancing interventions,” pp. 3082–3091, 2022.
- C. Haas, “The Price of Fairness - A Framework to Explore Trade-Offs in Algorithmic Fairness,” ICIS 2019 Proc., Nov. 2019.
- A. Langenberg, S.-C. Ma et al., “Formal group fairness and accuracy in automated decision making,” Mathematics, vol. 11, no. 8, 2023. [Online]. Available: https://www.mdpi.com/2227-7390/11/8/1771
- G. Canbek, T. Taskaya Temizel, and S. Sagiroglu, “BenchMetrics: A systematic benchmarking method for binary classification performance metrics,” Neural Computing and Applications, vol. 33, no. 21, pp. 14 623–50, Nov. 2021.
- D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, p. 6, Jan. 2020.
- D. Chicco, V. Starovoitov, and G. Jurman, “The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment,” IEEE Access, vol. 9, pp. 47 112–24, 2021.
- D. Chicco, M. J. Warrens, and G. Jurman, “The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment,” IEEE Access, vol. 9, pp. 78 368–81, 2021.
- M. Gösgens, A. Zhiyanov et al., “Good classification measures and how to find them,” Advances in Neural Information Processing Systems, vol. 34, pp. 17 136–17 147, 2021.
- A. Fabris, S. Messina et al., “Algorithmic Fairness Datasets: The Story so Far,” Data Mining and Knowledge Discovery, vol. 36, no. 6, pp. 2074–2152, Nov. 2022.