What's in a Name? Auditing Large Language Models for Race and Gender Bias
Abstract: We employ an audit design to investigate biases in state-of-the-art LLMs, including GPT-4. In our study, we prompt the models for advice involving a named individual across a variety of scenarios, such as during car purchase negotiations or election outcome predictions. We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women. Names associated with Black women receive the least advantageous outcomes. The biases are consistent across 42 prompt templates and several models, indicating a systemic issue rather than isolated incidents. While providing numerical, decision-relevant anchors in the prompt can successfully counteract the biases, qualitative details have inconsistent effects and may even increase disparities. Our findings underscore the importance of conducting audits at the point of LLM deployment and implementation to mitigate their potential for harm against marginalized communities.
- Ryan S. Baker and Aaron Hawn “Algorithmic Bias in Education” In International Journal of Artificial Intelligence in Education 32, 2022, pp. 1052–1092 DOI: 10.1007/s40593-021-00285-9
- J.R. Bent “Is algorithmic affirmative action legal” In Georgetown Law Journal 108, 2019, pp. 803
- “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination” In American Economic Review 94.4, 2004, pp. 991–1013 DOI: 10.1257/0002828042002561
- Aylin Caliskan, Joanna J. Bryson and Arvind Narayanan “Semantics derived automatically from language corpora contain human-like biases” In Science 356.6334, 2017, pp. 183–186 DOI: 10.1126/science.aal42
- Anupam Chander “The racist algorithm” In Michigan Law Review 115, 2016, pp. 1023
- “Companies Go All Out to Up Their Generative AI Game”, 2023 URL: https://www.bloomberg.com/news/articles/2023-07-31/how-companies-are-tackling-the-challenges-of-generative-ai
- Alexander Coppock “Avoiding Post-Treatment Bias in Audit Experiments” In Journal of Experimental Political Science 6.1, 2019, pp. 1–4 DOI: 10.1017/XPS.2018.9
- “The Measure and Mismeasure of Fairness” In Journal of Machine Learning Research 24, 2023, pp. 1–117 URL: http://jmlr.org/papers/volume24/22-1318/22-1318.pdf
- S.Michael Gaddis “How Black Are Lakisha and Jamal? Racial Perceptions from Names Used in Correspondence Audit Studies” In Sociological Science 4 Society for Sociological Science, 2017, pp. 469–489 DOI: 10.15195/v4.a19
- T.B. Gillis “The input fallacy” In Minnesota Law Review 106, 2022, pp. 1175
- Steven N. Goodman, Sharad Goel and Mark R. Cullen “Machine learning, health disparities, and causal reasoning” In Annals of Internal Medicine 169, 2018, pp. 883–884 DOI: 10.7326/M18-1463
- Daniel E. Ho and Albert Xiang “Affirmative algorithms: the legal grounds for fairness as awareness” In University of Chicago Law Review Online, 2020, pp. 134–154
- Aziz Z. Huq “Racial equity in algorithmic criminal justice” In Duke Law Journal 68, 2019, pp. 1043–1134
- Dominik K. Kanbach “The GenAI Is out of the Bottle: Generative Artificial Intelligence from a Business Model Innovation Perspective” Last visited Feb 9, 2024 In Rev Manag Sci, 2023 DOI: 10.1007/s11846-023-00696-z
- “Investigating Bias in Facial Analysis Systems: A Systematic Review” In IEEE Access 8, 2020, pp. 130751–130761 DOI: 10.1109/ACCESS.2020.3006051
- Paul T. Kim “Race-aware algorithms: fairness, nondiscrimination and affirmative action” In California Law Review 110, 2022, pp. 1539
- René F. Kizilcec and Hansol Lee “Algorithmic fairness in education” In The Ethics of Artificial Intelligence in Education Routledge, 2022, pp. 29
- Allison Koenecke, Andrew Nam and Emily Lake “Racial disparities in automated speech recognition” In Proceedings of the National Academy of Sciences 117.14, 2020, pp. 7684–7689 DOI: 10.1073/pnas.1915768117
- Hadas Kotek, Rikker Dockum and David Q. Sun “Gender bias and stereotypes in Large Language Models”, 2023 arXiv: https://arxiv.org/abs/2308.14921
- Sandra G. Mayson “Bias in, bias out” In Yale Law Journal 128, 2019, pp. 2218–2300
- “Ethical limitations of algorithmic fairness solutions in health care machine learning” In Lancet Digital Health 2, 2020, pp. e221–e223 DOI: 10.1016/S2589-7500(20)30062-1
- “Efficient Estimation of Word Representations in Vector Space”, 2013 arXiv: https://arxiv.org/abs/1301.3781
- “Dissecting racial bias in an algorithm used to manage the health of populations” In Science 366, 2019, pp. 447–453 DOI: 10.1126/science.aax2342
- “By Any Other Name?: On Being “Regarded As” Black, and Why Title VII Should Apply Even If Lakisha and Jamal Are White” In Wisconsin Law Review, 2005, pp. 1283 URL: https://scholarship.law.bu.edu/faculty_scholarship/314
- Devah Pager “The Use of Field Experiments for Studies of Employment Discrimination: Contributions, Critiques, and Directions for the Future” In The Annals of the American Academy of Political and Social Science 609.1 Sage Publications, 2007, pp. 104–133 DOI: 10.1177/0002716206294796
- Stephen R. Pfohl, Agata Foryciarz and Nigam H. Shah “An empirical characterization of fair machine learning for clinical risk prediction” In Journal of Biomedical Informatics 113, 2021, pp. 103621 DOI: 10.1016/j.jbi.2020.103621
- “The Woman Worked as a Babysitter: On Biases in Language Generation” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Hong Kong, China: Association for Computational Linguistics, 2019, pp. 3407–3412 DOI: 10.18653/v1/D19-1339
- Briana Vecchione, Karen Levy and Solon Barocas “Algorithmic Auditing and Social Justice: Lessons from the History of Audit Studies” In EAAMO ’21: Equity and Access in Algorithms, Mechanisms, and Optimization –, NY, USA: ACM, 2021 DOI: 10.1145/3465416.3483294
- “Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT”, 2023 arXiv: https://ar5iv.org/abs/2310.05135
- ““Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters”, 2023 arXiv: https://arxiv.org/abs/2310.09219
- Crystal S. Yang and Will Dobbie “Equal protection under algorithms: a new statistical and legal framework” In Michigan Law Review 119, 2020, pp. 291
- John Yinger “Testing for Discrimination in Housing and Related Markets” In National Report Card on Discrimination in America: The Role of Testing Urban Institute, 1998, pp. 27
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.