On Measuring Social Biases in Sentence Encoders (1903.10561v1)

Published 25 Mar 2019 in cs.CL and cs.CY

Abstract: The Word Embedding Association Test shows that GloVe and word2vec word embeddings exhibit human-like implicit biases based on gender, race, and other social constructs (Caliskan et al., 2017). Meanwhile, research on learning reusable text representations has begun to explore sentence-level texts, with some sentence encoders seeing enthusiastic adoption. Accordingly, we extend the Word Embedding Association Test to measure bias in sentence encoders. We then test several sentence encoders, including state-of-the-art methods such as ELMo and BERT, for the social biases studied in prior work and two important biases that are difficult or impossible to test at the word level. We observe mixed results including suspicious patterns of sensitivity that suggest the test's assumptions may not hold in general. We conclude by proposing directions for future work on measuring bias in sentence encoders.

Authors (5)

Chandler May (7 papers)
Alex Wang (32 papers)
Shikha Bordia (6 papers)
Samuel R. Bowman (103 papers)
Rachel Rudinger (46 papers)

Citations (553)

View on Semantic Scholar

Summary

On Measuring Social Biases in Sentence Encoders

The paper "On Measuring Social Biases in Sentence Encoders" explores the presence of social biases, such as those related to gender and race, within sentence encoders. This work extends the analysis of bias from the word level, as previously done with the Word Embedding Association Test (WEAT), to the sentence level using the proposed Sentence Encoder Association Test (SEAT).

Background and Motivation

Word embeddings like word2vec and GloVe have been widely recognized to propagate human-like biases, posing a risk of reinforcing social injustices through NLP systems. With the advent of sentence-level text representations such as ELMo and BERT, evaluating the biases in these models becomes crucial. The paper addresses this by adapting the WEAT to assess biases in sentence encoders, an effort to broaden the understanding from individual word representations to more complex sentence structures.

Methodology

The researchers generalized WEAT to SEAT by embedding words into sentence templates, enabling a bias analysis at the phrase and sentence levels. Several bias types difficult to test with single-word embeddings were targeted, including:

The Angry Black Woman stereotype, reflecting the intersectionality of race and gender.
A double bind in professional settings, highlighting contradictory expectations based on gender.

SEAT was applied to multiple sentence encoders, ranging from simpler models like CBoW (GloVe) to complex ones like BERT, with experiments conducted on both original and newly introduced bias tests.

Results

The paper revealed varied evidence of bias across different models. Notably, state-of-the-art models like BERT showed limited bias presence compared to older models. However, SEAT results suggested that biases might not generalize beyond test-specific words and contexts. Moreover, observed discrepancies, especially in tests relying on cosine similarity, indicated that this metric might not fully capture representation similarities in modern encoders.

Implications and Future Directions

The paper underscores the ongoing need to refine methods for bias detection at sentence-level representations. The recognized limitations of SEAT suggest that future work might focus on developing alternative evaluation techniques, potentially more aligned with application-level bias detection. Additionally, considering intersectional biases and their complex nature remains a vital concern.

By shedding light on these aspects, the paper contributes to the broader discourse on fairness and ethics in NLP, advocating for continuous scrutiny and improvement of LLMs to mitigate social biases effectively.

PDF Markdown

Related Papers

Find Related Papers