Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Impact of Gender Debiased Word Embeddings in Language Modeling (2105.00908v3)

Published 3 May 2021 in cs.CL

Abstract: Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing. A key path towards fairness is to understand, analyse and interpret our data and algorithms. Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases. In addition, current algorithms have also been proven to amplify biases from data. To further address these concerns, in this paper, we study how an state-of-the-art recurrent neural LLM behaves when trained on data, which under-represents females, using pre-trained standard and debiased word embeddings. Results show that LLMs inherit higher bias when trained on unbalanced data when using pre-trained embeddings, in comparison with using embeddings trained within the task. Moreover, results show that, on the same data, LLMs inherit lower bias when using debiased pre-trained emdeddings, compared to using standard pre-trained embeddings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Christine Basta (4 papers)
  2. Marta R. Costa-jussà (73 papers)
Citations (4)