Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT (2310.05135v1)
Abstract: LLMs such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks. One domain of interest is their use in algorithmic hiring, specifically in matching resumes with job categories. Yet, this introduces issues of bias on protected attributes like gender, race and maternity status. The seminal work of Bertrand & Mullainathan (2003) set the gold-standard for identifying hiring bias via field experiments where the response rate for identical resumes that differ only in protected attributes, e.g., racially suggestive names such as Emily or Lakisha, is compared. We replicate this experiment on state-of-art LLMs (GPT-3.5, Bard, Claude and Llama) to evaluate bias (or lack thereof) on gender, race, maternity status, pregnancy status, and political affiliation. We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information. Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation. We use contrastive input decoding on open-source LLMs to uncover potential sources of bias.
- Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, pp. 298–306, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384735. doi: 10.1145/3461702.3462624. URL https://doi.org/10.1145/3461702.3462624.
- Mitigating language-dependent ethnic bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 533–549, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.42. URL https://aclanthology.org/2021.emnlp-main.42.
- Palm 2 technical report, 2023.
- Are emily and greg more employable than lakisha and jamal? a field experiment on labor market discrimination. Working Paper 9873, National Bureau of Economic Research, July 2003. URL http://www.nber.org/papers/w9873.
- Snehaan Bhawal. Resume dataset. https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset, 2021. Accessed: June 23, 2023.
- Learning to match jobs with resumes from sparse interaction data using multi-view co-teaching network, 2020.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings, 2016.
- On the use of summarization and transformer architectures for profiling résumés. Expert Systems with Applications, 184:115521, 2021. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2021.115521. URL https://www.sciencedirect.com/science/article/pii/S0957417421009301.
- Language models are few-shot learners, 2020.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In Sorelle A. Friedler and Christo Wilson (eds.), Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pp. 77–91. PMLR, 23–24 Feb 2018. URL https://proceedings.mlr.press/v81/buolamwini18a.html.
- Chatbots as a job candidate evaluation tool. In Christophe Debruyne, Hervé Panetto, Wided Guédria, Peter Bollen, Ioana Ciuciu, George Karabatis, and Robert Meersman (eds.), On the Move to Meaningful Internet Systems: OTM 2019 Workshops, pp. 189–193, Cham, 2020. Springer International Publishing. ISBN 978-3-030-40907-4.
- U.S. Equal Employment Opportunity Commission. The pregnancy discrimination act of 1978, 1978. Public Law 95-555.
- Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. Ethics of Data and Analytics: Concepts and Cases, pp. 296, 2022.
- Jobbert: Understanding job titles through skills, 2021.
- Ronald Aylmer Fisher. Statistical methods for research workers. Springer, 1992.
- Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016.
- Do longer maternity leaves hurt women’s careers? Harvard Business Review, September 14 2018. URL https://hbr.org/2018/09/do-longer-maternity-leaves-hurt-womens-careers. Accessed on June 23, 2023.
- James Hu. 99% of fortune 500 companies use applicant tracking systems. https://www.jobscan.co/blog/99-percent-fortune-500-ats/, November 2019.
- Documenting high-risk ai: A european regulatory perspective. Computer, 56(5):18–27, 2023. doi: 10.1109/MC.2023.3235712.
- Carotene: A job title classification system for the online recruitment domain. In 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp. 286–293, 2015. doi: 10.1109/BigDataService.2015.61.
- Kaja Jurcisinova. A quick guide to updating your resume after maternity leave [resume example]. Kickresume Blog, September 22 2022a. URL https://blog.kickresume.com/a-quick-guide-to-updating-your-resume-after-maternity-leave-resume-example/. Accessed on June 23, 2023.
- Kaja Jurcisinova. A quick guide to updating your resume after maternity leave (+ resume example), 2022b. URL https://blog.kickresume.com/a-quick-guide-to-updating-your-resume-after-maternity-leave-resume-example/.
- Deep learning for procedural content generation. Neural Computing and Applications, 33(1):19–37, 01 2021. ISSN 1433-3058. doi: 10.1007/s00521-020-05383-8. URL https://doi.org/10.1007/s00521-020-05383-8.
- Steve Lohr. A hiring law blazes a path for a.i. regulation. The New York Times, May 2023. URL https://www.nytimes.com/2023/05/25/technology/ai-hiring-law-new-york.html.
- Gray I. Mateo-Harris. Politics in the workplace: A state-by-state guide. SHRM website, October 31 2016. URL https://www.shrm.org/resourcesandtools/legal-and-compliance/state-and-local-updates/pages/politics-at-work.aspx. Accessed on June 23, 2023.
- An image of society: Gender and racial representation and impact in image search results for occupations. Proc. ACM Hum.-Comput. Interact., 5(CSCW1), apr 2021. doi: 10.1145/3449100. URL https://doi.org/10.1145/3449100.
- CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1953–1967, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.154. URL https://aclanthology.org/2020.emnlp-main.154.
- Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pp. 469–481, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450369367. doi: 10.1145/3351095.3372828. URL https://doi.org/10.1145/3351095.3372828.
- Domain adaptation for resume classification using convolutional neural networks, 2017.
- Choose your programming copilot: A comparison of the program synthesis performance of github copilot and genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’22, pp. 1019–1027, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392372. doi: 10.1145/3512290.3528700. URL https://doi.org/10.1145/3512290.3528700.
- Llama: Open and efficient foundation language models, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- U.S. Bureau of Labor Statistics. Employed persons by detailed occupation, sex, race, and hispanic or latino ethnicity. U.S. Bureau of Labor Statistics, 2022. URL https://www.bls.gov/cps/cpsaat11.htm. Accessed on June 22, 2023.
- Investigating gender bias in language models using causal mediation analysis. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 12388–12401. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/92650b2e92217715fe312e6fa7b90d82-Paper.pdf.
- Jane Waldfogel. The family gap for young women in the united states and britain: Can maternity leave make a difference? Journal of Labor Economics, 16(3):505–545, 1998. doi: 10.1086/209897. URL https://doi.org/10.1086/209897.
- Surfacing biases in large language models using contrastive input decoding, 2023.
- A hybrid approach to conceptual classification and ranking of resumes and their corresponding job posts. 05 2017. ISBN 978-3-319-59420-0. doi: 10.1007/978-3-319-59421-7_10.
- Akshaj Kumar Veldanda (8 papers)
- Fabian Grob (2 papers)
- Shailja Thakur (12 papers)
- Hammond Pearce (35 papers)
- Benjamin Tan (42 papers)
- Ramesh Karri (92 papers)
- Siddharth Garg (99 papers)