Papers
Topics
Authors
Recent
Search
2000 character limit reached

Auditing the Use of Language Models to Guide Hiring Decisions

Published 3 Apr 2024 in stat.AP and cs.CL | (2404.03086v1)

Abstract: Regulatory efforts to protect against algorithmic bias have taken on increased urgency with rapid advances in LLMs, which are machine learning models that can achieve performance rivaling human experts on a wide array of tasks. A key theme of these initiatives is algorithmic "auditing," but current regulations -- as well as the scientific literature -- provide little guidance on how to conduct these assessments. Here we propose and investigate one approach for auditing algorithms: correspondence experiments, a widely applied tool for detecting bias in human judgements. In the employment context, correspondence experiments aim to measure the extent to which race and gender impact decisions by experimentally manipulating elements of submitted application materials that suggest an applicant's demographic traits, such as their listed name. We apply this method to audit candidate assessments produced by several state-of-the-art LLMs, using a novel corpus of applications to K-12 teaching positions in a large public school district. We find evidence of moderate race and gender disparities, a pattern largely robust to varying the types of application material input to the models, as well as the framing of the task to the LLMs. We conclude by discussing some important limitations of correspondence experiments for auditing algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Anthropic. Model Card and Evaluations for Claude Models. https://www-cdn.anthropic.com/bd2a28d2535bfb0494cc8e2a3bf135d2e7523226/Model-Card-Claude-2.pdf, July 2023.
  3. Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf, march 2024.
  4. Race effects on eBay. The RAND Journal of Economics, 46(4):891–917, 2015.
  5. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1):3–44, 2021.
  6. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American economic review, 94(4):991–1013, 2004.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Silvia Chiappa. Path-specific counterfactual fairness. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7801–7808, 2019.
  9. Blind justice: Algorithmically masking race in charging decisions. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 35–45, 2021.
  10. Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2):153–163, 2017.
  11. A snapshot of the frontiers of fairness in machine learning. Communications of the ACM, 63(5):82–89, 2020.
  12. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806, 2017.
  13. The measure and mismeasure of fairness. The Journal of Machine Learning Research, 24(1):14730–14846, 2023.
  14. Counterfactual risk assessments, evaluation, and fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 582–593, 2020.
  15. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012.
  16. European Commission. The Digital Services Act package — Shaping Europe’s digital future. https://digital-strategy.ec.europa.eu/en/policies/digital-services-act-package, 2024.
  17. European Parliament. Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206, 2024.
  18. S Michael Gaddis. Understanding the “how” and “why” aspects of racial-ethnic discrimination: A multimethod approach to audit studies. Sociology of Race and Ethnicity, 5(4):443–455, 2019.
  19. Searching for a roommate: A correspondence audit examining racial/ethnic and immigrant discrimination among millennials. Socius, 6:2378023120972287, 2020.
  20. Auditing work: Exploring the New York City algorithmic bias audit regime. arXiv preprint arXiv:2402.08101, 2024.
  21. What’s in a name? Auditing large language models for race and gender bias. arXiv preprint arXiv:2402.14875, 2024.
  22. Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29:3315–3323, 2016.
  23. Principal fairness for human and algorithmic decision-making. arXiv preprint arXiv:2005.10400, 2020.
  24. Mistral 7B. arXiv preprint arXiv:2310.06825, 2023.
  25. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  26. Avoiding discrimination through causal reasoning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 656–666, 2017.
  27. Inherent trade-offs in the fair determination of risk scores. In Proceedings of Innovations in Theoretical Computer Science (ITCS), 2017.
  28. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066–4076, 2017.
  29. Causal reasoning for algorithmic fairness. arXiv preprint arXiv:1805.05859, 2018.
  30. Sarah Lyons-Padilla, Hazel Rose Markus, Ashby Monk, Sid Radhakrishna, Radhika Shah, Norris A “Daryn” Dodson IV, and Jennifer L Eberhardt. Race influences professional investors’ financial judgments. Proceedings of the National Academy of Sciences, 116(35):17225–17230, 2019.
  31. Fair inference on outcomes. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  32. N.Y.C. Local Law No. 144. N.Y.C. Admin. Code, §20-870, 2021.
  33. Manish Raghavan. What should we do when our ideas of fairness conflict? Communications of the ACM, 67(1):88–97, 2023.
  34. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 469–481, 2020.
  35. Artificial intelligence in human resources management: Challenges and a path forward. California Management Review, 61(4):15–42, 2019.
  36. Evaluating and mitigating discrimination in language model decisions. arXiv preprint arXiv:2312.03689, 2023.
  37. Kevin Tobia. Disparate statistics. The Yale Law Journal, pages 2382–2420, 2017.
  38. Are Emily and Greg still more employable than Lakisha and Jamal? investigating algorithmic hiring bias in the era of chatgpt. arXiv preprint arXiv:2310.05135, 2023.
  39. Equal opportunity and affirmative action via counterfactual predictions. arXiv preprint arXiv:1905.10870, 2019.
  40. White House. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence, October 2023.
  41. Ronald E Wienk. Measuring racial discrimination in American housing markets: The housing market practices survey, volume 444. Department of Housing and Urban Development, Office of Policy Development, 1979.
  42. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180, 2017.
Citations (5)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.