Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Protected group bias and stereotypes in Large Language Models (2403.14727v1)

Published 21 Mar 2024 in cs.CY, cs.CL, and cs.LG
Protected group bias and stereotypes in Large Language Models

Abstract: As modern LLMs shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion, and race. Second, we have the model generate stories about individuals who hold different types of occupations. We collect >10k sentence completions made by a publicly available LLM, which we subject to human annotation. We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias, in model generations. The model not only reflects societal biases, but appears to amplify them. The model is additionally overly cautious in replies to queries relating to minoritized groups, providing responses that strongly emphasize diversity and equity to an extent that other group characteristics are overshadowed. This suggests that artificially constraining potentially harmful outputs may itself lead to harm, and should be applied in a careful and controlled manner.

Analyzing Bias in LLMs Across Protected Groups

Introduction

LLMs have become ubiquitous across various domains, aiding in tasks ranging from content generation to customer service. Despite their benefits, concerns about LLMs perpetuating or even amplifying societal biases have persisted. This paper explores bias within LLMs, particularly focusing on protected groups defined by characteristics such as gender, sexuality, religion, and race. By analyzing model outputs for stereotypical content and examining the amplification of bias, the paper contributes to a deeper understanding of ethical considerations in LLM application.

Methodology

The investigation employed a two-pronged approach. First, the model's tendency to associate certain occupations with specific protected groups was assessed via sentence completion tasks. Various prompt templates requested occupations suitable for individuals from different genders, sexual orientations, races, and religions, leading to a dataset of over 10,000 generation instances. Secondly, free-generated texts were analyzed where the model crafted stories involving individuals from occupations typically associated with gender stereotypes.

Bias and stereotypical content within these outputs were rigorously annotated by human evaluators, examining how the model's responses varied across different protected group categories. This included identifying responses that contained explicit or implicit bias, those that avoided the task by giving non-committal answers, and those that opted for an overly cautious stance emphasizing diversity.

Key Findings

The results revealed notable biases across various categories, with a particularly pronounced bias in gender and sexuality. Certain racio-ethnic groups also attracted stereotypical responses. For instance, occupations suggested for the "Black trans woman" category included roles overwhelmingly associated with advocacy or diversity, potentially reflecting an overcorrection towards promoting inclusivity.

Bias in Occupational Suggestions:

  • Protected groups, especially those linked to gender and sexuality, often received occupation suggestions that either conformed to societal stereotypes or were heavily filtered through a lens of diversity and inclusion. Notably, "trans woman" and "gay" categories exhibited higher instances of biased suggestions.
  • Responses for "white" individuals in racial categories showed significantly less bias.
  • The interplay of multiple protected group characteristics, such as "Black gay Muslim trans woman," revealed compounded biases suggesting intersectionality increases the complexity and extent of stereotyping by the model.

Gender Bias in Generated Text:

  • A strong gender bias was observed, with the LLM disproportionately associating stereotypical occupations with the corresponding gender pronouns, which could reinforce harmful stereotypes.

Implications and Future Research

This paper underscores the critical need for more nuanced approaches to mitigating bias in LLMs. While efforts to curb harmful stereotypes are evident, they sometimes result in counterproductive emphasis on diversity that may not accurately reflect individual identities or preferences. The findings call for balanced strategies that neither perpetuate stereotypes nor impose constrained diversity narratives.

Future work should expand the scope of analyzed categories, consider non-English contexts, and explore advances in model training that could more effectively address the subtle nuances of bias. Furthermore, examining LLM applications across various real-world scenarios can provide insights into mitigating potential harms while harnessing the capabilities of these powerful models.

Conclusion

The paper offers a granular look at how current LLMs manage delicate issues surrounding protected group characteristics, highlighting significant areas for improvement. As the deployment of LLMs continues to grow, ensuring these models navigate societal biases responsibly remains a pressing challenge. Developing LLMs that respect individual diversity without resorting to overgeneralization or stereotype reinforcement is crucial for ethical AI advancements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Persistent Anti-Muslim Bias in Large Language Models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (Virtual Event, USA) (AIES ’21). Association for Computing Machinery, New York, NY, USA, 298–306. https://doi.org/10.1145/3461702.3462624
  2. Ashley B Armstrong. 2023. Who’s Afraid of ChatGPT? An Examination of ChatGPT’s Implications for Legal Writing. An Examination of ChatGPT’s Implications for Legal Writing (January 23, 2023) (2023).
  3. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv:2302.04023 [cs.CL]
  4. Solon Barocas and Andrew D. Selbst. 2016. Big Data’s Disparate Impact. , 671–732 pages. https://doi.org/10.15779/Z38BG31
  5. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. arXiv:1904.08783 [cs.CL]
  6. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , 610–623 pages.
  7. The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies. arXiv preprint arXiv:2212.08104 (2022).
  8. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454–5476. https://doi.org/10.18653/v1/2020.acl-main.485
  9. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets.
  10. Automation and Stock Prices: The Case of ChatGPT. Available at SSRN (2023).
  11. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. arXiv:1607.06520 [cs.CL]
  12. Speculative Futures on ChatGPT and Generative Artificial Intelligence (AI): A collective reflection from the educational landscape. Asian Journal of Distance Education 18, 1 (2023).
  13. Language Models are Few-Shot Learners. , 1877–1901 pages. arXiv:2005.14165 [cs.CL]
  14. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186. https://doi.org/10.1126/science.aal4230
  15. Chatgpt goes to law school. Available at SSRN (2023).
  16. Deep reinforcement learning from human preferences. arXiv:1706.03741 [stat.ML]
  17. Kimberlé Crenshaw. 1989. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. u. Chi. Legal f. (1989), 139.
  18. Kimberlé Williams Crenshaw. 1991. Mapping the margins: intersectionality, identity politics, and violence against women of color. Stanford Law Review 43 (1991), 1241–1299.
  19. Michael Dowling and Brian Lucey. 2023. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 53 (2023), 103662.
  20. An intersectional definition of fairness. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1918–1921.
  21. Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes. arXiv:1711.08412 http://arxiv.org/abs/1711.08412
  22. Stereotypical Gender Effects in 2016. Presentation at CUNY Conference on Human Sentence Processing 30.
  23. Wei Guo and Aylin Caliskan. 2021. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. ACM. https://doi.org/10.1145/3461702.3462536
  24. Fairness Without Demographics in Repeated Loss Minimization. arXiv:1806.08010 [stat.ML]
  25. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. arXiv preprint arXiv:2212.14882 (2022).
  26. Mitigating Gender Bias Amplification in Distribution by Posterior Regularization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2936–2942. https://doi.org/10.18653/v1/2020.acl-main.264
  27. S.M. Kennison and J.L. Trofe. 2003. Comprehending pronouns: A role for word-specific gender stereotype information. Journal of Psycholinguistic Research 32, 3 (2003), 355–378.
  28. Svetlana Kiritchenko and Saif M. Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. arXiv:1805.04508 [cs.CL]
  29. Handling and Presenting Harmful Text in NLP Research. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 497–510. https://aclanthology.org/2022.findings-emnlp.35
  30. Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models. arXiv:2102.04130 [cs.CL]
  31. William R Knight. 1966. A computer method for calculating Kendall’s tau with ungrouped data. J. Amer. Statist. Assoc. 61, 314 (1966), 436–439.
  32. Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. Association for Computational Linguistics, Florence, Italy, 166–172. https://doi.org/10.18653/v1/W19-3823
  33. Benchmarking Intersectional Biases in NLP. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3598–3609. https://doi.org/10.18653/v1/2022.naacl-main.263
  34. Feature-Wise Bias Amplification. arXiv:1812.08999 [cs.LG]
  35. Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models. arXiv:2304.01852 [cs.CL]
  36. Gender Bias in Neural Natural Language Processing. arXiv:1807.11714 [cs.CL]
  37. Intersectional Bias in Causal Language Models. arXiv:2107.07691 [cs.CL]
  38. On Measuring Social Biases in Sentence Encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 622–628. https://doi.org/10.18653/v1/N19-1063
  39. How Generative AI models such as ChatGPT can be (Mis)Used in SPC Practice, Education, and Research? An Exploratory Study. arXiv:2302.10916 [cs.LG]
  40. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5356–5371. https://doi.org/10.18653/v1/2021.acl-long.416
  41. Putting ChatGPT’s Medical Advice to the (Turing) Test. medRxiv (2023), 2023–01.
  42. Pipelines for Social Bias Testing of Large Language Models. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Association for Computational Linguistics, virtual+Dublin, 68–74. https://doi.org/10.18653/v1/2022.bigscience-1.6
  43. OpenAI. 2022. OpenAI: Introducing ChatGPT. https://openai.com/blog/chatgpt
  44. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  45. Probing Toxic Content in Large Pre-Trained Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4262–4274. https://doi.org/10.18653/v1/2021.acl-long.329
  46. Training language models to follow instructions with human feedback. arXiv:2203.02155 [cs.CL]
  47. Reducing Gender Bias in Abusive Language Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2799–2804. https://doi.org/10.18653/v1/D18-1302
  48. Andrew M Perlman et al. 2022. The Implications of OpenAI’s Assistant for Legal Services and Society. Available at SSRN (2022).
  49. Tammy Pettinato Oltz. 2023. ChatGPT, Professor of Law. Professor of Law (February 4, 2023) (2023).
  50. Improving language understanding by generative pre-training. (2018).
  51. Better language models and their implications. OpenAI Blog https://openai. com/blog/better-language-models 1, 2 (2019).
  52. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  53. Gender Bias in Coreference Resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 8–14. https://doi.org/10.18653/v1/N18-2002
  54. Malik Sallam. 2023. The utility of ChatGPT as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv (2023), 2023–02.
  55. Social Bias Frames: Reasoning about Social and Power Implications of Language.
  56. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407–3412. https://doi.org/10.18653/v1/D19-1339
  57. “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 9180–9211. https://aclanthology.org/2022.emnlp-main.625
  58. Release Strategies and the Social Impacts of Language Models. arXiv:1908.09203 [cs.CL]
  59. Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1679–1684. https://doi.org/10.18653/v1/P19-1164
  60. Evaluating debiasing techniques for intersectional biases. arXiv preprint arXiv:2109.10441 (2021).
  61. Mitigating Gender Bias in Natural Language Processing: Literature Review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1630–1640. https://doi.org/10.18653/v1/P19-1159
  62. What are the biases in my word embedding? arXiv:1812.08769 [cs.CL]
  63. You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings. https://openreview.net/forum?id=rK-7NhfSIW5
  64. Yi Chern Tan and L Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. Advances in neural information processing systems 32 (2019).
  65. US Labor Bureau of Statistics. 2022. Employed persons by detailed occupation, sex, race, and Hispanic or Latino ethnicity. Accessed May 13, 2023. https://www.bls.gov/cps/cpsaat11.htm.
  66. Getting Gender Right in Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3003–3008. https://doi.org/10.18653/v1/D18-1334
  67. Nationality Bias in Text Generation. arXiv:2302.02463 [cs.CL]
  68. A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1324–1332. https://aclanthology.org/2022.coling-1.113
  69. Bertie Vidgen and Leon Derczynski. 2020. Directions in Abusive Language Training Data: Garbage In, Garbage Out. CoRR abs/2004.01670 (2020). arXiv:2004.01670 https://arxiv.org/abs/2004.01670
  70. The Wall Street Neophyte: A Zero-Shot Analysis of ChatGPT Over MultiModal Stock Movement Prediction Challenges. arXiv preprint arXiv:2304.05351 (2023).
  71. Mitigating Unwanted Biases with Adversarial Learning. CoRR abs/1801.07593 (2018). arXiv:1801.07593 http://arxiv.org/abs/1801.07593
  72. Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 629–634. https://doi.org/10.18653/v1/N19-1064
  73. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. arXiv:1707.09457 [cs.AI]
  74. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 15–20. https://doi.org/10.18653/v1/N18-2003
  75. Learning Gender-Neutral Word Embeddings. arXiv:1809.01496 [cs.CL]
  76. Exploring AI Ethics of ChatGPT: A Diagnostic Analysis. arXiv:2301.12867 [cs.CL]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hadas Kotek (9 papers)
  2. David Q. Sun (6 papers)
  3. Zidi Xiu (7 papers)
  4. Margit Bowler (2 papers)
  5. Christopher Klein (11 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com