Evaluating Biased Attitude Associations of Language Models in an Intersectional Context (2307.03360v1)
Abstract: LLMs are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English LLMs using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of LLMs. Adapting the projection-based approach to embedding association tests that quantify bias, we find that LLMs exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language. We find that the largest and better-performing model that we study is also more biased as it effectively captures bias embedded in sociocultural data. We validate the bias evaluation method by overperforming on an intrinsic valence evaluation task. The approach enables us to measure complex intersectional biases as they are known to manifest in the outputs and applications of LLMs that perpetuate historical biases. Moreover, our approach contributes to design justice as it studies the associations of groups underrepresented in language such as transgender and homosexual individuals.
- Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. CoRR abs/1904.08783 (2019). arXiv:1904.08783 http://arxiv.org/abs/1904.08783
- Words high and low in pleasantness as rated by male and female college students. Behavior Research Methods, Instruments & Computers 18, 3 (1986), 299–303. https://doi.org/10.3758/BF03204403 ID: 1988-03937-001.
- A neural probabilistic language model. The journal of machine learning research 3 (2003), 1137–1155.
- Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963 (2021).
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. https://doi.org/10.5281/zenodo.5297715
- Enriching Word Vectors with Subword Information. arXiv:1607.04606 [cs.CL]
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., Red Hook, NY, USA, 4356–4364.
- Margaret M. Bradley and Peter J. Lang. 1999. Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings.
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
- N-gram counts and language models from the common crawl. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). 3579–3584.
- Aylin Caliskan. 2021. Detecting and mitigating bias in natural language processing. Brookings Institution (2021).
- Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (Apr 2017), 183–186. https://doi.org/10.1126/science.aal4230
- Aylin Caliskan and Molly Lewis. [n.d.]. Social biases in word embeddings and their relation to human cognition. ([n. d.]).
- Clueweb09 data set.
- Historical representations of social groups across 200 years of word embeddings from Google Books. Proceedings of the National Academy of Sciences 119, 28 (2022), e2121798119.
- Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. arXiv:2305.18189 [cs.CL]
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
- Alice Eagly and Antonio Mladinic. 1989. Gender Stereotypes and Attitudes Toward Women and Men. Personality and Social Psychology Bulletin 15 (12 1989), 543–558. https://doi.org/10.1177/0146167289154008
- Alice H. Eagly and Antonio Mladinic. 1994. Are People Prejudiced Against Women? Some Answers From Research on Attitudes, Gender Stereotypes, and Judgments of Competence. European Review of Social Psychology 5, 1 (1994), 1–35. https://doi.org/10.1080/14792779543000002 arXiv:https://doi.org/10.1080/14792779543000002
- Are Women Evaluated More Favorably Than Men?: An Analysis of Attitudes, Beliefs, and Emotions. Psychology of Women Quarterly 15, 2 (1991), 203–216. https://doi.org/10.1111/j.1471-6402.1991.tb00792.x arXiv:https://doi.org/10.1111/j.1471-6402.1991.tb00792.x
- Cognitive and Affective Bases of Attitudes toward Social Groups and Social Policies. Journal of Experimental Social Psychology 30, 2 (1994), 113–137. https://doi.org/10.1006/jesp.1994.1006
- Kawin Ethayarajh. 2019. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 55–65. https://doi.org/10.18653/v1/D19-1006
- The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027 (2020).
- Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences 115, 16 (2018), E3635–E3644. https://doi.org/10.1073/pnas.1720347115 arXiv:https://www.pnas.org/content/115/16/E3635.full.pdf
- Sourojit Ghosh and Aylin Caliskan. 2023. ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages. arXiv preprint arXiv:2305.10510 (2023).
- Aaron Gokaslan and Vanya Cohen. 2019. OpenWebText Corpus. http://Skylion007.github.io/OpenWebTextCorpus.
- Hila Gonen and Yoav Goldberg. 2019. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. In Proceedings of NAACL-HLT.
- Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology 74, 6 (1998), 1464.
- Wei Guo and Aylin Caliskan. 2021. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (Virtual Event, USA) (AIES ’21). Association for Computing Machinery, New York, NY, USA, 122–133. https://doi.org/10.1145/3461702.3462536
- Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1489–1501. https://doi.org/10.18653/v1/P16-1141
- Mary B Harris. 1994. Growing old gracefully: Age concealment and gender. Journal of Gerontology 49, 4 (1994), P149–P158.
- Michael A. Hogg and Dominic Abrams. 2007. Social cognition and attitudes. In Psychology. Third Edition, G. Neil Martin, Neil R. Carlson, and William Buskist (Eds.). Pearson Education Limited, 684–721. https://kar.kent.ac.uk/23659/
- An atlas of semantic profiles for 360 words. The American Journal of Psychology 71, 4 (1958), 688–699.
- I. T. Jolliffe. 1986. Principal component analysis. Springer-Verlag, New York.
- The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review 84, 5 (2019), 905–949. https://doi.org/10.1177/0003122419877135 arXiv:https://doi.org/10.1177/0003122419877135
- Measuring Bias in Contextualized Word Representations. arXiv:1906.07337 [cs.CL]
- Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).
- Feature-Wise Bias Amplification. arXiv:1812.08999 [cs.LG]
- Molly Lewis and Gary Lupyan. 2020. Gender stereotypes are reflected in the distributional structure of 25 languages. Nature human behaviour 4, 10 (2020), 1021–1028.
- Towards Debiasing Sentence Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5502–5515. https://doi.org/10.18653/v1/2020.acl-main.488
- Syntactic annotations for the google books ngram corpus. In Proceedings of the ACL 2012 system demonstrations. 169–174.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
- Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 615–621. https://doi.org/10.18653/v1/N19-1062
- On Measuring Social Biases in Sentence Encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 622–628. https://doi.org/10.18653/v1/N19-1063
- Albert Mehrabian and James A. Russell. 1974. An approach to environmental psychology. The MIT Press, Cambridge, MA, US. xii, 266 pages. ID: 1974-22049-000.
- Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 1699–1710.
- Advances in Pre-Training Distributed Word Representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).
- Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 [cs.CL]
- Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, 746–751. https://aclanthology.org/N13-1090
- Saif M. Mohammad. 2018. Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words. In Proceedings of The Annual Conference of the Association for Computational Linguistics (ACL). Melbourne, Australia.
- Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020).
- Shiva Omrani Sabbaghi and Aylin Caliskan. 2022. Measuring Gender Bias in Word Embeddings of Gendered Languages Requires Disentangling Grammatical Gender Signals. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 518–531.
- The Measurement of Meaning. University of Illinois Press. https://books.google.com/books?id=Qj8GeUrKZdAC
- The measurement of meaning. Univer. Illinois Press, Oxford, England. 342 pages. ID: 1958-01561-000.
- English gigaword fifth edition, linguistic data consortium. Google Scholar (2011).
- GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
- Deep contextualized word representations. arXiv:1802.05365 [cs.CL]
- Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237. https://doi.org/10.18653/v1/N18-1202
- Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.
- Language Models are Unsupervised Multitask Learners. (2019).
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html
- Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7237–7256. https://www.aclweb.org/anthology/2020.acl-main.647/
- The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3407–3412.
- Aina Gari Soler and Marianna Apidianaki. 2021. Let’s Play Mono-Poly: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses. Transactions of the Association for Computational Linguistics (TACL) (2021).
- What are the biases in my word embedding? arXiv:1812.08769 [cs.CL]
- Yi Chern Tan and L. Elisa Celis. 2019. Assessing Social and Intersectional Biases in Contextualized Word Representations. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 13209–13220. https://proceedings.neurips.cc/paper/2019/hash/201d546992726352471cfea6b0df0a48-Abstract.html
- Auke Tellegen. 1985. Structures of mood and personality and their relevance to assessing anxiety, with an emphasis on self-report. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ, US, 681–706. ID: 1985-97708-037.
- William Timkey and Marten van Schijndel. 2021. All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4527–4546.
- Autumn Toney and Aylin Caliskan. 2020. ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across Languages and Over Centuries. arXiv:2006.03950 [cs.CY]
- Automatically characterizing targeted information operations through biases present in discourse on twitter. In 2021 IEEE 15th International Conference on Semantic Computing (ICSC). IEEE, 82–83.
- Trieu H Trinh and Quoc V Le. 2018. A simple method for commonsense reasoning. arXiv preprint arXiv:1806.02847 (2018).
- Attention is all you need. In Advances in neural information processing systems. 5998–6008.
- Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior research methods 45, 4 (2013), 1191–1207.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 38–45.
- Robert Wolfe and Aylin Caliskan. 2021. Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models. arXiv:2110.00672 [cs.CY]
- Robert Wolfe and Aylin Caliskan. 2022a. American== white in multimodal language-and-image ai. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 800–812.
- Robert Wolfe and Aylin Caliskan. 2022b. Detecting Emerging Associations and Behaviors With Regional and Diachronic Word Embeddings. In 2022 IEEE 16th International Conference on Semantic Computing (ICSC). IEEE, 91–98.
- Robert Wolfe and Aylin Caliskan. 2022c. Markedness in visual semantic ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1269–1279.
- Robert Wolfe and Aylin Caliskan. 2022d. VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models. Association for the Advancement of Artificial Intelligence (AAAI) (2022).
- XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf
- Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 629–634. https://doi.org/10.18653/v1/N19-1064
- Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2979–2989. https://doi.org/10.18653/v1/D17-1323
- Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision. 19–27.
- Shiva Omrani Sabbaghi (2 papers)
- Robert Wolfe (23 papers)
- Aylin Caliskan (38 papers)