Dice Question Streamline Icon: https://streamlinehq.com

Identify the sources of GPT-3.5’s hiring-related biases

Determine the underlying factors that led to the gender, racial, and other biases observed in GPT-3.5 when auditing resume assessment and resume generation tasks in a United States hiring context, with particular attention to biases arising from the model’s training data. Establish how characteristics of the training data and related components contribute to the measured disparities in scores and generated resume content.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper conducts two audit studies of OpenAI’s GPT-3.5 in hiring contexts: resume assessment and resume generation. It finds statistically significant though small differences in assessment scores by race and gender representation, and stronger latent biases in generated resumes, including less experience and seniority for women and immigrant markers for Asian and Hispanic names.

In discussing potential origins, the authors note that such biases may stem from training data, as GPT models are trained on large-scale web data (e.g., Common Crawl), which can reflect societal stereotypes and skewed distributions (e.g., more online resumes from recent graduates). However, they explicitly state they cannot conclude the causes of the observed biases and call for future work analyzing LLMs and their training data to locate and understand these sources.

References

While we cannot conclude what led to the biases we observed, a fundamental limitation of algorithm auditing, we encourage future work that builds and analyzes LLMs for such biases in training data and elsewhere.

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring (2405.04412 - Armstrong et al., 7 May 2024) in Discussion, Reflecting on Potential Sources of Bias