Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions (2306.04597v1)

Published 7 Jun 2023 in cs.CL and cs.LG

Abstract: Societal biases present in pre-trained LLMs are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trained model. While the majority of current state-of-the-art debiasing methods focus on changes to the training regime, in this paper, we propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models. Specifically, we empirically show that by fine-tuning a pre-trained model on only 10 de-biased (intervened) training examples, the tendency to favor any gender is significantly reduced. Since our proposed method only needs a few training examples, our few-shot debiasing approach is highly feasible and practical. Through extensive experimentation, we show that our debiasing technique performs better than competitive state-of-the-art baselines with minimal loss in LLMing ability.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Himanshu Thakur (3 papers)
Atishay Jain (8 papers)
Praneetha Vaddamanu (7 papers)
Paul Pu Liang (103 papers)
Louis-Philippe Morency (123 papers)

Citations (24)

View on Semantic Scholar

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions (2306.04597v1)

Related Papers