Investigating Gender Bias in Turkish Language Models (2404.11726v1)
Abstract: LLMs are trained mostly on Web data, which often contains social stereotypes and biases that the models can inherit. This has potentially negative consequences, as models can amplify these biases in downstream tasks or applications. However, prior research has primarily focused on the English language, especially in the context of gender bias. In particular, grammatically gender-neutral languages such as Turkish are underexplored despite representing different linguistic properties to LLMs with possibly different effects on biases. In this paper, we fill this research gap and investigate the significance of gender bias in Turkish LLMs. We build upon existing bias evaluation frameworks and extend them to the Turkish language by translating existing English tests and creating new ones designed to measure gender bias in the context of T\"urkiye. Specifically, we also evaluate Turkish LLMs for their embedded ethnic bias toward Kurdish people. Based on the experimental results, we attribute possible biases to different model characteristics such as the model size, their multilingualism, and the training corpora. We make the Turkish gender bias dataset publicly available.
- Jaimeen Ahn and Alice Oh. 2021. Mitigating language-dependent ethnic bias in bert. arXiv preprint arXiv:2109.05704.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623.
- Camiel J Beukeboom and Christian Burgers. 2019. How stereotypes are shared through language: a review and introduction of the aocial categories and stereotypes communication (scsc) framework. Review of Communication Research, 7:1–37.
- Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91. PMLR.
- Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
- Rodrigo Alejandro Ch’avez Mulsa and Gerasimos Spanakis. 2020. Evaluating bias in Dutch word embeddings. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, pages 56–71, Barcelona, Spain (Online). Association for Computational Linguistics.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Bias and fairness in large language models: A survey. Preprint, arXiv:2309.00770.
- Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology, 74(6):1464.
- Medyada nefret söylemi ve ayrımcı söylem 2019 raporu.
- Penalties for success: reactions to women who succeed at male gender-typed tasks. Journal of applied psychology, 89(3):416.
- Angelie Kraft. 2021. Triggering Models: Measuring and Mitigating Bias in German Language Generation. Ph.D. thesis, Master’s thesis, University of Hamburg.
- Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337.
- Alexandra Luccioni and Joseph Viviano. 2021. What’s in the box? an analysis of undesirable content in the common crawl corpus. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 182–189.
- On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561.
- Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
- Kevser Özaydinlik. 2014. Toplumsal cinsiyet temelinde türkiye’de kadın ve eğitim. Sosyal Politika Çalışmaları Dergisi, (33).
- Prasanna Parasurama and João Sedoc. 2021. Degendering resumes for fair algorithmic resume screening. arXiv preprint.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
- Stefan Schweter. 2020. Berturk - bert models for turkish.
- Pamela Stone and Meg Lovejoy. 2004. Fast-track women and the “choice” to stay home. The Annals of the American Academy of Political and Social Science, 596(1):62–83.
- Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1630–1640, Florence, Italy. Association for Computational Linguistics.
- Fewer errors, but more stereotypes? the effect of model size on gender bias. arXiv preprint arXiv:2206.09860.
- Shijie Wu and Mark Dredze. 2020. Are all languages created equal in multilingual bert? arXiv preprint arXiv:2005.09093.
- mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
- Orhun Caglidil (1 paper)
- Malte Ostendorff (23 papers)
- Georg Rehm (32 papers)