Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions (2308.16681v3)

Published 31 Aug 2023 in stat.ML and cs.LG

Abstract: A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems' design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or "hack" a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. A Reductions Approach to Fair Classification. (2018).
  2. Debiasing classifiers: is reality at variance with expectation? (2021). https://doi.org/10.48550/arXiv.2011.02407
  3. Fairwashing: the risk of rationalization. In International Conference on Machine Learning. PMLR, 161–170.
  4. Quarto. https://doi.org/10.5281/zenodo.5960048 DOI: 10.5281/zenodo.5960048.
  5. Machine bias. ProPublica (05 2016), 254–264. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  6. It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks. (2022). https://doi.org/10.48550/arXiv.2106.05498
  7. Classification - No Fairness through Unawareness. In Fairness and Machine Learning: Limitations and Opportunities. The MIT Press, Cambridge, Massachusetts.
  8. Modeling the Machine Learning Multiverse. (2022). https://doi.org/10.48550/arXiv.2206.05985
  9. Fairlearn: A toolkit for assessing and improving fairness in AI. (2020).
  10. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. WIREs Data Mining and Knowledge Discovery 13, 2 (03 2023). https://doi.org/10.1002/widm.1484
  11. Model Multiplicity: Opportunities, Concerns, and Solutions. (2022).
  12. Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences 119, 44 (11 2022), e2203150119. https://doi.org/10.1073/pnas.2203150119 Publisher: Proceedings of the National Academy of Sciences.
  13. US Census Bureau. 2021a. ACS Health Insurance Coverage Recoding Programming Code. https://www.census.gov/topics/health/health-insurance/guidance/programming-code/acs-recoding.html Section: Government.
  14. US Census Bureau. 2021b. Understanding and using the American Community Survey public use microdata sample files: What data users need to know.
  15. Impact of Imputation Strategies on Fairness in Machine Learning. Journal of Artificial Intelligence Research 74 (09 2022). https://doi.org/10.1613/jair.1.13197
  16. nteract contributors. 2017. papermill: Parametrize and run Jupyter and nteract Notebooks. https://github.com/nteract/papermill
  17. David R Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological) 20, 2 (1958), 215–232. Publisher: Wiley Online Library.
  18. Retiring Adult: New Datasets for Fair Machine Learning. (2021), 13.
  19. Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery (09 2022). https://doi.org/10.1007/s10618-022-00854-z
  20. Application of Machine Learning Algorithms to an online Recruitment System. (2012).
  21. Matthias Feurer and Frank Hutter. 2019. Hyperparameter Optimization. Springer International Publishing, Cham, 3–33. https://doi.org/10.1007/978-3-030-05318-5_1 DOI: 10.1007/978-3-030-05318-5_1.
  22. Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29, 5 (10 2001), 1189–1232. https://doi.org/10.1214/aos/1013203451 Publisher: Institute of Mathematical Statistics.
  23. Andrew Gelman and Eric Loken. 2014. The Statistical Crisis in Science. American Scientist 102, 6 (2014), 460. Publisher: Sigma XI-The Scientific Research Society.
  24. Equality of Opportunity in Supervised Learning. (2016).
  25. Luke Henriques-Gomes. 2023. Robodebt: five years of lies, mistakes and failures that caused a $1.8bn scandal. The Guardian (03 2023). https://www.theguardian.com/australia-news/2023/mar/11/robodebt-five-years-of-lies-mistakes-and-failures-that-caused-a-18bn-scandal
  26. Tin Kam Ho. 1995. Random decision forests, Vol. 1. IEEE, 278–282.
  27. Giles Hooker. 2007. Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables. Journal of Computational and Graphical Statistics 16, 3 (2007), 709–732. https://www.jstor.org/stable/27594267
  28. International Conference on Machine Learning. PMLR, 754–762. https://proceedings.mlr.press/v32/hutter14.html ISSN: 1938-7228.
  29. Amnesty International. 2021. Xenophobic Machines. Technical Report. https://www.amnesty.org/en/wp-content/uploads/2021/10/EUR3546862021ENGLISH.pdf
  30. AIES ’21: AAAI/ACM Conference on AI, Ethics, and Society. ACM, Virtual Event USA, 586–596. https://doi.org/10.1145/3461702.3462614
  31. Alboukadel Kassambara. 2023. ggpubr: ’ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr
  32. Katherine Keisler-Starkey and Lisa N Bunch. 2022. Health Insurance Coverage in the United States: 2021 - Appendix Table C3. Technical Report. https://www.census.gov/content/dam/Census/library/publications/2022/demo/p60-278.pdf
  33. Fairness in Algorithmic Profiling: A German Case Study. (2021). https://doi.org/10.48550/arXiv.2108.04134
  34. Ronny Kohavi and Barry Becker. 1996. Adult data set. UCI machine learning repository 5 (1996), 2093.
  35. Max Kuhn and Kjell Johnson. 2020. Feature engineering and selection: a practical approach for predictive models. CRC Press, Taylor & Francis Group, Boca Raton London New York. www.feat.engineering
  36. A survey on datasets for fairness-aware machine learning. WIREs Data Mining and Knowledge Discovery 12, 3 (2022), e1452. https://doi.org/10.1002/widm.1452 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1452.
  37. Kristof Meding and Thilo Hagendorff. 2024. Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms. Philosophy & Technology 37, 1 (Jan. 2024), 4. https://doi.org/10.1007/s13347-023-00679-8
  38. A Survey on Bias and Fairness in Machine Learning. Comput. Surveys 54, 6 (07 2021), 115:1–115:35. https://doi.org/10.1145/3457607
  39. Multi–objective Evolutionary Algorithms for the Risk–return Trade–off in Bank Loan Management. International Transactions in Operational Research 9, 5 (2002), 583–597. https://doi.org/10.1111/1475-3995.00375 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/1475-3995.00375.
  40. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (10 2019), 447–453. https://doi.org/10.1126/science.aax2342 Publisher: American Association for the Advancement of Science.
  41. OPEN SCIENCE COLLABORATION . 2015. Estimating the reproducibility of psychological science. Science 349, 6251 (08 2015), aac4716. https://doi.org/10.1126/science.aac4716 Publisher: American Association for the Advancement of Science.
  42. Esteban Ortiz-Ospina and Max Roser. 2017. Healthcare Spending. Our World in Data (06 2017). https://ourworldindata.org/financing-healthcare
  43. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  44. AIES ’21: AAAI/ACM Conference on AI, Ethics, and Society. ACM, Virtual Event USA, 854–863. https://doi.org/10.1145/3461702.3462629
  45. Multi-Objective Automatic Machine Learning with AutoxgboostMC. arXiv 1908.10796 [stat.ML] (2019).
  46. Bias and Fairness (2 ed.). Chapman and Hall/CRC. Num Pages: 32.
  47. WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023, Chicago, IL, USA, June 12-15, 2023. ACM, 160–171. https://doi.org/10.1145/3593013.3593985
  48. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science 22, 11 (11 2011), 1359–1366. https://doi.org/10.1177/0956797611417632
  49. P-Curve: A Key to the File-Drawer. Journal of Experimental Psychology: General 143, 2 (2014), 534–547. https://doi.org/10.1037/a0033242
  50. Specification Curve Analysis. Nature Human Behaviour 4, 11 (Nov. 2020), 1208–1214. https://doi.org/10.1038/s41562-020-0912-z
  51. Health Insurance Coverage and Health — What the Recent Evidence Tells Us. New England Journal of Medicine 377, 6 (08 2017), 586–593. https://doi.org/10.1056/NEJMsb1706645
  52. Increasing Transparency Through a Multiverse Analysis. Perspectives on Psychological Science 11, 5 (09 2016), 702–712. https://doi.org/10.1177/1745691616658637 Publisher: SAGE Publications Inc.
  53. On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition. (2022). https://doi.org/10.48550/arXiv.2210.09943
  54. Pipenv Maintainer Team. 2017. pipenv: Python Development Workflow for Humans. https://github.com/pypa/pipenv
  55. R Core Team. 2022. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  56. The pandas development team. 2020. pandas-dev/pandas: Pandas. Zenodo. https://doi.org/10.5281/zenodo.3509134 DOI: 10.5281/zenodo.3509134.
  57. Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA.
  58. Hilde J. P. Weerts. 2021. An Introduction to Algorithmic Fairness. (2021). https://doi.org/10.48550/arXiv.2105.05595
  59. Welcome to the Tidyverse. https://joss.theoj.org DOI: 10.21105/joss.01686.
  60. Hui Zou and Trevor Hastie. 2005. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67, 2 (2005), 301–320. https://www.jstor.org/stable/3647580 Publisher: [Royal Statistical Society, Wiley].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jan Simson (3 papers)
  2. Florian Pfisterer (23 papers)
  3. Christoph Kern (19 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets