White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs (2404.10508v4)

Published 16 Apr 2024 in cs.CL, cs.AI, and cs.CY

Abstract: Social biases can manifest in language agency. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in LLM-generated content. In addition, previous works often rely on string-matching techniques to identify agentic and communal words within texts, which fall short of accurately classifying language agency. We introduce the novel Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE leverages 5,400 template-based prompts, an accurate agency classifier, and corresponding bias metrics to test for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. We also contribute the Language Agency Classification (LAC) dataset, consisting of 3,724 agentic and communal sentences. Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend to demonstrate greater gender bias than human-written texts; (2) Models demonstrate remarkably higher levels of intersectional bias than the other bias aspects. Those who are at the intersection of gender and racial minority groups--such as Black females--are consistently described by texts with lower levels of agency, aligning with real-world social inequalities; (3) Among the 3 LLMs investigated, Llama3 demonstrates the greatest overall bias; (4) Not only does prompt-based mitigation fail to resolve language agency bias in LLMs, but it frequently leads to the exacerbation of biases in generated texts.

PDF Abstract

Investigating Social Biases in Language Agency from LLMs and Human-Written Texts

Introduction to Language Agency and Social Biases

Language plays a pivotal role in reflecting and perpetuating social biases. These biases often manifest as differences in perceived levels of agency and communal behavior in written and spoken descriptions of diverse demographic groups. Traditionally, dominant social groups are characterized more agentically, displaying traits associated with leadership and autonomy, while minority groups are often described in more communal terms, which emphasize passivity and support roles. This paper presents a comprehensive examination of these biases across several text types, both human-written and generated by LLMs, applying a novel dataset and classification approach to quantify language agency at a granular level.

Methodology and Data Overview

To address the nuances of measuring agency within text, the researchers devised a Language Agency Classification (LAC) dataset, fostering the training of models capable of discerning agency in written language more accurately than prior tools. The agency classifier derived from this dataset was then utilized to analyze six varied sources of text data—biographies, professor reviews, and reference letters—highlighting disparities based on gender, race, and intersectional identities.

LAC Dataset Construction and Classifier Development

The LAC dataset, created using sentence-level annotations, was designed to capture the essence of agentic and communal language more effectively than traditional string matching techniques, which have shown substantial limitations in prior studies. Using a combination of generation by ChatGPT and human annotation adjustments, the dataset showed high reliability (with a Fleiss's Kappa score of 0.90 after refining). This dataset underpinned the development of an agency classifier, with BERT models outperforming other architectures like RoBERTa and Llama2 in agency classification tasks.

Text Data Collection and Analysis

The text samples analyzed were split between human-written sources and those synthesized by LLMs. For human-written texts, existing datasets were utilized wherever possible, while LLMs generated additional required data, especially where racial and intersectional information was crucial but absent in available resources. Each text type underwent rigorous processing to ensure that the examinations of agency were grounded in robust and representative sample data.

Key Findings on Agency Bias

The paper unearthed several compelling findings:

Alignment with Social Science Insights: The biases identified in human-written texts paralleled known social biases, confirming that agentic and communal descriptions in texts align with broader societal inequality.
Exaggeration by LLMs: Texts generated by LLMs displayed more pronounced biases compared to human-authored texts, suggesting that without careful calibration, LLMs could augment existing social biases.
Pronounced Impact on Minorities: Texts about minority groups, especially those from intersectional backgrounds (e.g., Black women), consistently exhibited lower levels of agency. This pattern was consistent across data types and sources, indicating a pervasive issue in both human and machine language processing.

Implications and Future Work

This investigation into language agency offers significant insights into the nuanced ways biases manifest in text. The findings suggest a critical need for more nuanced LLMs that account for and mitigate these biases. Future research should expand to cover more diverse data and continue improving classifier accuracy and applicability. Additionally, these insights are crucial for developers of LLMs and users of these models in applications, especially in sensitive contexts where biased language may have real-world consequences.

Concluding Remarks

By methodically analyzing the language of agency across diverse text sources and forms, this paper illuminates the significant work that remains in understanding and addressing language-based social biases. The hope is that these efforts will lead to more informed and equitable applications of AI in natural language processing, reducing the perpetuation of bias and enhancing fairness in automated text generation and analysis.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Yixin Wan (19 papers)
Kai-Wei Chang (292 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/yixin_wan_/status/1780704565643629002

https://twitter.com/leoniehaimson/status/1807880164430233870

https://twitter.com/WGOV/status/1803588017434546488

https://twitter.com/arxivsanitybot/status/1780951057579086001