Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dialect prejudice predicts AI decisions about people's character, employability, and criminality (2403.00742v1)

Published 1 Mar 2024 in cs.CL, cs.AI, and cs.CY

Abstract: Hundreds of millions of people now interact with LLMs, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these LLMs are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in LLMs, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in LLMs. Here, we demonstrate that LLMs embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that LLMs have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the LLMs' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking LLMs to make hypothetical decisions about people, based only on how they speak. LLMs are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in LLMs such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching LLMs to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.

Exploring the Covert Racial Bias in AI LLMs through Dialect Prejudice

Introduction

Recent advancements in NLP have seen the blossoming of various applications of LLMs (LMs), ranging from writing aids to tools informing employment decisions. With such widespread utility comes the crucial question of bias in AI systems, especially racial bias, which has been documented in cases relating to African American English (AAE). While extensive research exists on overt racial prejudice in LLMs, the subtle nuances of covert racism, especially in the form of dialect prejudice, have not been fully explored. This paper presents an empirical investigation into dialect prejudice in LLMs, revealing a bias in AI decisions based on dialects indicative of a speaker's racial background.

The focus is on the extent to which LLMs embed covert racism by exhibiting bias against the AAE dialect, a component of covert racism.

Approach

This paper employs Matched Guise Probing, which adapts the matched guise technique from sociolinguistics to the written domain, enabling an examination of the biases held by LMs against texts written in AAE compared to Standard American English (SAE). The approach embeds AAE or SAE text in prompts, asking the LMs to make assumptions about the speaker's character, employability, and criminality without overt references to race. This strategy probes the covert stereotypes within LMs by focusing on dialect features rather than explicit racial identifiers.

Illustrating this through different experiments, the paper highlights that LLMs, including GPT-2, GPT-3.5, GPT-4, RoBERTa, and T5, consistently assign more negative attributes and outcomes to AAE speakers. This unveils a striking discrepancy between the overtly positive attributes associated with African Americans and the covert negative stereotypes triggered by the AAE dialect in these models.

Study 1: Covert Stereotypes in LLMs

Matching the setup of the Princeton Trilogy studies on racial stereotypes, the research uncovers that LLMs align more closely with archaic human stereotypes from before the civil rights movement. This suggests that LMs covertly harbor the most negative stereotypes about African Americans ever experimentally recorded, contrary to the more positive overt assertions about African Americans typically generated by these models.

Study 2: Impact of Covert Stereotypes on AI Decisions

Exploring the real-world implications of dialect prejudice, the paper demonstrates that LMs are more likely to associate speakers of AAE with less prestigious jobs, criminal convictions, and even death penalties. These outcomes reflect a significant bias and the potential for substantial harm when language technology is applied in critical domains like employment and law enforcement.

Study 3: Resolvability of Dialect Prejudice

Analyzing potential mitigation strategies such as scaling model size and training with human feedback, the research finds that neither approach effectively reduces the observed dialect prejudice. Surprisingly, larger models and those trained with human feedback exhibit greater covert racial prejudice, suggesting that current methods for bias mitigation may not address the subtleties of covert racism in LLMs.

Discussion

The findings of this paper underscore a deep-seated issue of covert racism manifesting through dialect prejudice within current LLMs. This reflects not only the biases present in the underlying training data but also the complex nature of societal racial attitudes that these models inadvertently learn and perpetuate. As AI continues to integrate into various societal sectors, addressing these covert prejudices becomes crucial for developing equitable and unbiased AI systems.

Conclusion

This paper has shed light on the covert racial biases present in LLMs, particularly through the lens of dialect prejudice. By revealing the extent to which current LMs associate negative stereotypes and outcomes with AAE, it calls for a deeper examination of bias in AI and the development of more sophisticated approaches to mitigate racial prejudice in language technology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (109)
  1. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 298–306, 2021.
  2. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Computer Science, 2:e93, 2016.
  3. Carolyn P. Atkins. Do employment recruiters discriminate on the basis of nonstandard dialect? Journal of Employment Counseling, 30(3):108–118, 1993.
  4. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv 2204.05862, 2022.
  5. Peter Ball. Stereotypes of Anglo-Saxon and non-Anglo-Saxon accents: Some exploratory Australian studies with the matched guise technique. Language Sciences, 5(2):163–183, 1983.
  6. Evaluating the underlying gender bias in contextualized word embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 33–39, 2019.
  7. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623, 2021.
  8. Stereotyping by omission: Eliminate the negative, accentuate the positive. Journal of Personality and Social Psychology, 102(6):1214–1238, 2012.
  9. Andrew Billings. Beyond the Ebonics debate: Attitudes about Black and Standard American English. Journal of Black Studies, 36(1):68–81, 2005.
  10. J. Stewart Black and Patrick van Esch. AI-enabled recruiting: What is it and how should a manager use it? Business Horizons, 63(2):215–226, 2020.
  11. Racial disparity in natural language processing: A case study of social media African-American English. arXiv 1707.00061, 2017.
  12. Demographic dialectal variation in social media: A case study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1119–1130, 2016.
  13. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, 2020.
  14. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems 29, 2016.
  15. Eduardo Bonilla-Silva. The new racism: Racial structure in the United States, 1960s-1990s. In Race, Ethnicity, and Nationality in the United States: Toward the Twenty-First Century, pages 55–101. Westview Press, Boulder, CO, 1999.
  16. Eduardo Bonilla-Silva. Racism without Racists. Rowman & Littlefield, Plymouth, UK, 2014.
  17. Language models are few-shot learners. arXiv 2005.14165, 2020.
  18. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186, 2017.
  19. Marked personas: Using natural language prompts to measure stereotypes in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 1504–1532, 2023.
  20. PaLM: Scaling language modeling with pathways. arXiv 2204.02311, 2022.
  21. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems 30, 2017.
  22. Recent unobtrusive studies of Black and White discrimination and prejudice: A literature review. Psychological Bulletin, 87(3):546–563, 1980.
  23. Thinking well of African Americans: Measuring complimentary stereotypes and negative prejudice. Basic and Applied Social Psychology, 28(3):233–250, 2006.
  24. Mark Davies. The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25(4):447–464, 2010.
  25. Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1693–1706, 2022.
  26. Are racial stereotypes really fading? The Princeton Trilogy revisited. Personality and Social Psychology Bulletin, 21(11):1139–1150, 1995.
  27. Documenting large webtext corpora: A case study on the Colossal Clean Crawled Corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305, 2021.
  28. Aversive racism. Advances in Experimental Social Psychology, 36:1–52, 2004.
  29. Gabriel Doyle. Mapping dialectal variation by querying social media. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 98–106, 2014.
  30. Jacob Eisenstein. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 359–369, 2013.
  31. Jacob Eisenstein. Systematic patterning in phonologically-motivated orthographic variation. Journal of Sociolinguistics, 19(2):161–188, 2015.
  32. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1277–1287, 2010.
  33. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 11737–11762, 2023.
  34. Discovering and categorising language biases in Reddit. arXiv 2008.02754, 2020.
  35. Brown Corpus manual, 1979.
  36. The matched-guise technique for measuring attitudes and their implications for language education: A critical assessment, 1991.
  37. The Pile: An 800GB dataset of diverse text for language modeling. arXiv 2101.00027, 2021.
  38. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16):E3635–E3644, 2018.
  39. An intersectional analysis of gender and ethnic stereotypes. Psychology of Women Quarterly, 37(1):113–127, 2013.
  40. Gustave M. Gilbert. Stereotype persistance and change among college students. Journal of Abnormal and Social Psychology, 46(2):245–254, 1951.
  41. Tanya Golash-Boza. A critical and comprehensive sociological theory of race and racism. Sociology of Race and Ethnicity, 2(2):129–141, 2016.
  42. Lisa J. Green. African American English: A Linguistic Introduction. Cambridge University Press, Cambridge, UK, 2002.
  43. Investigating African-American Vernacular English in transformer-based text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 5877–5883, 2020.
  44. Jeffrey Grogger. Speech patterns and racial wage inequality. The Journal of Human Resources, 46(1):1–25, 2011.
  45. Probing image-language transformers for verb understanding. arXiv 2106.09141, 2021.
  46. Training compute-optimal large language models. arXiv 2203.15556, 2022.
  47. Understanding U.S. regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems, 59:244–255, 2016.
  48. Richard A. Hudson. Sociolinguistics. Cambridge University Press, Cambridge, UK, 1996.
  49. Occupational prestige: The status component of socioeconomic status, 2022.
  50. Ethics of AI-enabled recruiting and selection: A review and research agenda. Journal of Business Ethics, 178(4):977–1007, 2022.
  51. Education and intergroup attitudes: Moral enlightenment, superficial democratic commitment, or ideological refinement? American Sociological Review, 49(6):751–769, 1984.
  52. Health system-scale language models are all-purpose prediction engines. Nature, 619(7969):357–362, 2023.
  53. Taylor Jones. Toward a description of African American Vernacular English dialect regions using “Black Twitter”. American Speech, 90(4):403–440, 2015.
  54. Challenges of studying and processing dialects in social media. In Proceedings of the Workshop on Noisy User-generated Text, pages 9–18, 2015.
  55. Learning a POS tagger for AAVE-like language. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1115–1120, 2016.
  56. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ, 2000.
  57. On the fading of social stereotypes: Studies in three generations of college students. Journal of Personality and Social Psychology, 13(1):1–16, 1969.
  58. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274, 2023.
  59. Racial stereotypes of one hundred college students. Journal of Abnormal and Social Psychology, 28(3):280–290, 1933.
  60. Dialect on trial: Raciolinguistic ideologies in perceptions of AAVE and MAE codeswitching. University of Pennsylvania Working Papers in Linguistics, 28(2):51–59, 2022.
  61. Language-agnostic bias detection in language models with bias probing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12735–12747, 2023.
  62. “Sounding Black”: Speech stereotypicality activates racial stereotypes and expectations about appearance. Frontiers in Psychology, 12:785283, 2021.
  63. Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 166–172, 2019.
  64. Evaluational reactions to spoken languages. Journal of Abnormal and Social Psychology, 60:44–51, 1960.
  65. Holistic evaluation of language models. arXiv 2211.09110, 2022.
  66. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 1907.11692, 2019.
  67. Li Lucy and David Bamman. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, 2021.
  68. Reid Luhman. Appalachian English stereotypes: Language attitudes in Kentucky. Language in Society, 19(3):331–348, 1990.
  69. Ethnic and national stereotypes: The Princeton Trilogy revisited and revised. Personality and Social Psychology Bulletin, 27(8):996–1010, 2001.
  70. Use of Black English and racial discrimination in urban housing markets: New methods and findings. Urban Affairs Review, 36(4):452–469, 2001.
  71. Understanding stereotypes in language models: Towards robust measurement and zero-shot debiasing. arXiv 2212.10678, 2022.
  72. Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law, 28(2):237–266, 2020.
  73. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 5356–5371, 2021.
  74. CrowS-Pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 1953–1967, 2020.
  75. Large language models as tax attorneys: A case study in legal capabilities emergence. arXiv 2306.07075, 2023.
  76. Do word embeddings capture spelling variation? In Proceedings of the 28th International Conference on Computational Linguistics, pages 870–881, 2020.
  77. GPT-4 technical report. arXiv 2303.08774, 2023.
  78. Training language models to follow instructions with human feedback. arXiv 2203.02155, 2022.
  79. Speaking Ebonics in a professional context: The role of ethos/source credibility and perceived sociability of the speaker. Journal of Technical Writing and Communication, 30(4):367–383, 2000.
  80. Geoffrey Pullum. African American Vernacular English is not standard English with mistakes. In The Workings of Language: From Prescriptions to Perspectives, pages 39–58. Praeger Publishers, Westport, CT, 1999.
  81. Perceptual and phonetic experiments on American English dialect identification. Journal of Language and Social Psychology, 18(1):10–30, 1999.
  82. Language models are unsupervised multitask learners. 2019.
  83. Scaling language models: Methods, analysis & insights from training Gopher. arXiv 2112.11446, 2021.
  84. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
  85. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266, 2019.
  86. John R. Rickford. African American Vernacular English: Features, Evolution, Educational Implications. Blackwell, Malden, MA, 1999.
  87. Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond. Language, 92(4):948–988, 2016.
  88. Reactions to African-American Vernacular English: Do more phonological features matter? The Western Journal of Black Studies, 28(3):407–414, 2004.
  89. Mock Ebonics: Linguistic racism in parodies of Ebonics on the internet. Journal of Sociolinguistics, 3(3):360–380, 1999.
  90. Unsettling race and language: Toward a raciolinguistic perspective. Language in Society, 46(5):621–647, 2017.
  91. Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2699–2712, 2020.
  92. Huntsville, hospitals, and hockey teams: Names can reveal your location. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 116–121, 2017.
  93. Whose opinions do language models reflect? arXiv 2303.17548, 2023.
  94. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, 2019.
  95. Racial Attitudes in America: Trends and Interpretations. Harvard University Press, Cambridge, MA, 1997.
  96. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3407–3412, 2019.
  97. Measuring occupational prestige on the 2012 General Social Survey, 2014.
  98. Upstream mitigation is not all you need: Testing the bias transfer hypothesis in pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 3524–3542, 2022.
  99. Ian Stewart. Now we stronger than ever: African-American English syntax in Twitter. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 31–37, 2014.
  100. Harry Surden. Artificial intelligence and law: An overview. Georgia State University Law Review, 35(4):1305–1337, 2019.
  101. Studs Terkel. Race: How Blacks and Whites Think and Feel about the American Obsession. New Press, New York City, NY, 1992.
  102. Delimiting perceptual cues used for the ethnic labeling of African American and European American voices. Journal of Sociolinguistics, 8(1):54–87, 2004.
  103. Scientific, legal, and ethical concerns about AI-based personnel selection tools: A call to action. Personnel Assessment and Decisions, 7(2), 2021.
  104. Applying artificial intelligence: Implications for recruitment. Strategic HR Review, 17(5):255–258, 2018.
  105. Ethical and social risks of harm from language models. arXiv 2112.04359, 2021.
  106. Ethan Zhang and Yi Zhang. Average precision. In Encyclopedia of Database Systems, pages 192–193. Springer, Boston, MA, 2009.
  107. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 15–20, 2018.
  108. Calibrate Before Use: Improving Few-shot Performance of Language Models. In Proceedings of the 38th International Conference on Machine Learning, pages 12697–12706. PMLR, July 2021.
  109. VALUE: Understanding dialect disparity in NLU. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 3701–3720, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Valentin Hofmann (21 papers)
  2. Pratyusha Ria Kalluri (2 papers)
  3. Dan Jurafsky (118 papers)
  4. Sharese King (1 paper)
Citations (24)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com