On Measuring Social Biases in Sentence Encoders
The paper "On Measuring Social Biases in Sentence Encoders" explores the presence of social biases, such as those related to gender and race, within sentence encoders. This work extends the analysis of bias from the word level, as previously done with the Word Embedding Association Test (WEAT), to the sentence level using the proposed Sentence Encoder Association Test (SEAT).
Background and Motivation
Word embeddings like word2vec and GloVe have been widely recognized to propagate human-like biases, posing a risk of reinforcing social injustices through NLP systems. With the advent of sentence-level text representations such as ELMo and BERT, evaluating the biases in these models becomes crucial. The paper addresses this by adapting the WEAT to assess biases in sentence encoders, an effort to broaden the understanding from individual word representations to more complex sentence structures.
Methodology
The researchers generalized WEAT to SEAT by embedding words into sentence templates, enabling a bias analysis at the phrase and sentence levels. Several bias types difficult to test with single-word embeddings were targeted, including:
- The Angry Black Woman stereotype, reflecting the intersectionality of race and gender.
- A double bind in professional settings, highlighting contradictory expectations based on gender.
SEAT was applied to multiple sentence encoders, ranging from simpler models like CBoW (GloVe) to complex ones like BERT, with experiments conducted on both original and newly introduced bias tests.
Results
The paper revealed varied evidence of bias across different models. Notably, state-of-the-art models like BERT showed limited bias presence compared to older models. However, SEAT results suggested that biases might not generalize beyond test-specific words and contexts. Moreover, observed discrepancies, especially in tests relying on cosine similarity, indicated that this metric might not fully capture representation similarities in modern encoders.
Implications and Future Directions
The paper underscores the ongoing need to refine methods for bias detection at sentence-level representations. The recognized limitations of SEAT suggest that future work might focus on developing alternative evaluation techniques, potentially more aligned with application-level bias detection. Additionally, considering intersectional biases and their complex nature remains a vital concern.
By shedding light on these aspects, the paper contributes to the broader discourse on fairness and ethics in NLP, advocating for continuous scrutiny and improvement of LLMs to mitigate social biases effectively.