Investigating Gender Bias in Turkish Language Models (2404.11726v1)

Published 17 Apr 2024 in cs.CL

Abstract: LLMs are trained mostly on Web data, which often contains social stereotypes and biases that the models can inherit. This has potentially negative consequences, as models can amplify these biases in downstream tasks or applications. However, prior research has primarily focused on the English language, especially in the context of gender bias. In particular, grammatically gender-neutral languages such as Turkish are underexplored despite representing different linguistic properties to LLMs with possibly different effects on biases. In this paper, we fill this research gap and investigate the significance of gender bias in Turkish LLMs. We build upon existing bias evaluation frameworks and extend them to the Turkish language by translating existing English tests and creating new ones designed to measure gender bias in the context of T\"urkiye. Specifically, we also evaluate Turkish LLMs for their embedded ethnic bias toward Kurdish people. Based on the experimental results, we attribute possible biases to different model characteristics such as the model size, their multilingualism, and the training corpora. We make the Turkish gender bias dataset publicly available.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (28)

Authors (3)

Orhun Caglidil (1 paper)
Malte Ostendorff (23 papers)
Georg Rehm (32 papers)

Citations (1)

View on Semantic Scholar

Tweets

https://twitter.com/gastronomy/status/1781171719970943127

Investigating Gender Bias in Turkish Language Models (2404.11726v1)

Related Papers

Tweets