The Birth of Bias: A case study on the evolution of gender bias in an English language model (2207.10245v1)

Published 21 Jul 2022 in cs.CL and cs.AI

Abstract: Detecting and mitigating harmful biases in modern LLMs are widely recognized as crucial, open problems. In this paper, we take a step back and investigate how LLMs come to be biased in the first place. We use a relatively small LLM, using the LSTM architecture trained on an English Wikipedia corpus. With full access to the data and to the model parameters as they change during every step while training, we can map in detail how the representation of gender develops, what patterns in the dataset drive this, and how the model's internal state relates to the bias in a downstream task (semantic textual similarity). We find that the representation of gender is dynamic and identify different phases during training. Furthermore, we show that gender information is represented increasingly locally in the input embeddings of the model and that, as a consequence, debiasing these can be effective in reducing the downstream bias. Monitoring the training dynamics, allows us to detect an asymmetry in how the female and male gender are represented in the input embeddings. This is important, as it may cause naive mitigation strategies to introduce new undesirable biases. We discuss the relevance of the findings for mitigation strategies more generally and the prospects of generalizing our methods to larger LLMs, the Transformer architecture, other languages and other undesirable biases.

Authors (4)

Oskar van der Wal (9 papers)
Jaap Jumelet (25 papers)
Katrin Schulz (11 papers)
Willem Zuidema (32 papers)

Citations (13)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

The Birth of Bias: A case study on the evolution of gender bias in an English language model (2207.10245v1)

Summary

Related Papers