Evaluating Biased Attitude Associations of Language Models in an Intersectional Context (2307.03360v1)

Published 7 Jul 2023 in cs.CY, cs.AI, cs.CL, and cs.LG

Abstract: LLMs are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English LLMs using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of LLMs. Adapting the projection-based approach to embedding association tests that quantify bias, we find that LLMs exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language. We find that the largest and better-performing model that we study is also more biased as it effectively captures bias embedded in sociocultural data. We validate the bias evaluation method by overperforming on an intrinsic valence evaluation task. The approach enables us to measure complex intersectional biases as they are known to manifest in the outputs and applications of LLMs that perpetuate historical biases. Moreover, our approach contributes to design justice as it studies the associations of groups underrepresented in language such as transgender and homosexual individuals.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (83)

Authors (3)

Shiva Omrani Sabbaghi (2 papers)
Robert Wolfe (23 papers)
Aylin Caliskan (38 papers)

Citations (20)

View on Semantic Scholar

Evaluating Biased Attitude Associations of Language Models in an Intersectional Context (2307.03360v1)

Related Papers