Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantifying and Reducing Stereotypes in Word Embeddings (1606.06121v1)

Published 20 Jun 2016 in cs.CL, cs.LG, and stat.ML

Abstract: Machine learning algorithms are optimized to model statistical properties of the training data. If the input data reflects stereotypes and biases of the broader society, then the output of the learning algorithm also captures these stereotypes. In this paper, we initiate the study of gender stereotypes in {\em word embedding}, a popular framework to represent text data. As their use becomes increasingly common, applications can inadvertently amplify unwanted stereotypes. We show across multiple datasets that the embeddings contain significant gender stereotypes, especially with regard to professions. We created a novel gender analogy task and combined it with crowdsourcing to systematically quantify the gender bias in a given embedding. We developed an efficient algorithm that reduces gender stereotype using just a handful of training examples while preserving the useful geometric properties of the embedding. We evaluated our algorithm on several metrics. While we focus on male/female stereotypes, our framework may be applicable to other types of embedding biases.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tolga Bolukbasi (20 papers)
  2. Kai-Wei Chang (292 papers)
  3. James Zou (232 papers)
  4. Venkatesh Saligrama (110 papers)
  5. Adam Kalai (10 papers)
Citations (56)