Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparing Biases and the Impact of Multilingual Training across Multiple Languages (2305.11242v1)

Published 18 May 2023 in cs.CL

Abstract: Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases compare across languages and how the biases are affected when training a model on multilingual data versus monolingual data. We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task to observe whether specific demographics are viewed more positively. We study bias similarities and differences across these languages and investigate the impact of multilingual vs. monolingual training data. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture (e.g. majority religions and nationalities). Additionally, we find an increased variation in predictions across protected groups, indicating bias amplification, after multilingual finetuning in comparison to multilingual pretraining.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Sharon Levy (22 papers)
  2. Neha Anna John (11 papers)
  3. Ling Liu (132 papers)
  4. Yogarshi Vyas (16 papers)
  5. Jie Ma (205 papers)
  6. Yoshinari Fujinuma (9 papers)
  7. Miguel Ballesteros (70 papers)
  8. Vittorio Castelli (24 papers)
  9. Dan Roth (222 papers)
Citations (22)
X Twitter Logo Streamline Icon: https://streamlinehq.com