Introduction
LLMs have become integral to various applications in sensitive decision-making scenarios, making the presence of biases within them a significant concern. Two primary biases evaluated in this research are gender and race within a professional context. These biases could potentially affect outcomes and perpetuate societal stereotypes if not addressed effectively. The paper utilizes a dataset of 99 professions to assess whether models exhibit biases when assigning gender or race to these professions. While gender bias appears to be on the decline, racial bias still persists in LLMs.
Methodology
The paper employs a two-pronged approach: one for gender and another for racial bias. Gender bias is tested by tasking models with assigning a gender to different professions, comparing the results against human-annotated ground truth. The evaluation covers both older models (like BERT, GPT-2) and newer ones (like GPT-3.5 and Claude). Racial bias is assessed by generating descriptions for individuals of various races in different professions and analyzing the responses for stereotypes. The paper operationalizes societal biases as varied accuracies in judgment based on gender, race, and social status.
Gender Analysis
Investigating gender bias, the paper finds that newer models like GPT-3.5 show improvement over older versions, with a substantial reduction in gender bias. However, challenges remain as models like Flan-T5 exhibit significant biases, failing to embrace recent shifts towards gender neutrality in professions. Metrics such as bias score are used to compare model performances, showing that GPT-3.5 exhibits the least bias among the evaluated models. The research highlights that while advancements are evident, the path to completely unbiased AI representations of gender in professions is still unfolding.
Race Analysis
In assessing racial bias, GPT-3.5 generates descriptions that adhere to stereotypes for different races across various professions. By measuring the similarity of responses and employing a Linguistic Inquiry and Word Count (LIWC) analysis, the paper shows noticeable differences in the emotional, social, and work-related attributes ascribed to different races. These inconsistencies reveal implicit biases where certain races are depicted with more emotive descriptors or differing attitudes towards work and social interactions.
Conclusion
The evaluation framework developed and applied in this paper demonstrates that, despite improvements, LLMs such as GPT-3.5 still exhibit biases related to gender and race. The research underlines the importance of continued efforts to mitigate these biases, suggesting that future studies could broaden the analysis to include other models and evaluate the impact of biases on human behavior more directly. The paper contributes to the critical discourse on creating fairer AI systems by providing a method to identify and measure the subtle prejudices that could influence real-world decisions.