Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Markers and Drivers of Gender Bias in Machine Translations (2403.11896v2)

Published 18 Mar 2024 in cs.CL, cs.CY, and cs.SE

Abstract: Implicit gender bias in LLMs is a well-documented problem, and implications of gender introduced into automatic translations can perpetuate real-world biases. However, some LLMs use heuristics or post-processing to mask such bias, making investigation difficult. Here, we examine bias in LLMs' via back-translation, using the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks used in a previous study. Each statement starts with 'she', and is translated first into a 'genderless' intermediate language then back into English; we then examine pronoun-choice in the back-translated texts. We expand prior research in the following ways: (1) by comparing results across five intermediate languages, namely Finnish, Indonesian, Estonian, Turkish and Hungarian; (2) by proposing a novel metric for assessing the variation in gender implied in the repeated translations, avoiding the over-interpretation of individual pronouns, apparent in earlier work; (3) by investigating sentence features that drive bias; (4) and by comparing results from three time-lapsed datasets to establish the reproducibility of the approach. We found that some languages display similar patterns of pronoun use, falling into three loose groups, but that patterns vary between groups; this underlines the need to work with multiple languages. We also identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations. Moreover, we see a good level of replicability in the results, and establish that our variation metric proves robust despite an obvious change in the behaviour of the DeepL translation API during the course of the study. These results show that the back-translation method can provide further insights into bias in LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. W. Phillips and L. Boroditsky, “Can quirks of grammar affect the way you think? Grammatical gender and object concepts,” in Proceedings of the annual meeting of the cognitive science society, vol. 25, 2003, issue: 25.
  2. E. K. Koerner, “The Sapir-Whorf hypothesis: A preliminary history and a bibliographical essay,” Journal of Linguistic Anthropology, vol. 2, no. 2, pp. 173–198, 1992, publisher: Wiley Online Library.
  3. C. Treude and H. Hata, “She Elicits Requirements and He Tests: Software Engineering Gender Bias in Large Language Models,” in Proceedings of The 20th International Conference on Mining Software Repositories (MSR 2023).   Melbourne, Australia: IEEE/ACM, May 2023, pp. 624–629.
  4. E. Esperança-Rodier and D. Frankowski, “DeepL vs Google Translate: who’s the best at translating MWEs from French into Polish? a multidisciplinary approach to corpora creation and quality translation of MWEs,” in Translating and the Computer 43, Asling, 2021.
  5. A. Yulianto and R. Supriatnaningsih, “Google Translate vs. DeepL: a quantitative evaluation of close-language pair translation (French to English),” AJELP: Asian Journal of English Language and Pedagogy, vol. 9, no. 2, pp. 109–127, 2021.
  6. J. Woetzel, A. Madgavkar, and K. Ellingrud, “The power of parity,” McKinsey & Company, Tech. Rep., 2015.
  7. Y. Wang and D. Redmiles, “Implicit gender biases in professional software development: An empirical study,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS).   IEEE, 2019, pp. 1–10.
  8. N. Imtiaz, J. Middleton, J. Chakraborty, N. Robson, G. Bai, and E. Murphy-Hill, “Investigating the effects of gender bias on GitHub,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).   IEEE, 2019, pp. 700–711.
  9. T. Crick, T. Prickett, J. Bradnum, and A. Godfrey, “Gender parity in peer assessment of team software development projects,” in Proceedings of 6th Conference on Computing Education Practice, 2022, pp. 9–12.
  10. R. Garcia, C.-J. Liao, A. Pearce, and C. Treude, “Gender Influence on Communication Initiated within Student Teams,” in Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1, 2022, pp. 432–438.
  11. J. Terrell, A. Kofink, J. Middleton, C. Rainear, E. Murphy-Hill, C. Parnin, and J. Stallings, “Gender differences and bias in open source: Pull request acceptance of women versus men,” PeerJ Computer Science, vol. 3, p. e111, 2017, publisher: PeerJ Inc.
  12. M. P. Robillard, “Turnover-induced knowledge loss in practice,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 1292–1302.
  13. S. A. Piazzolla, B. Savoldi, and L. Bentivogli, “Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems,” Jun. 2023, arXiv:2306.05882 [cs]. [Online]. Available: http://arxiv.org/abs/2306.05882
  14. D. de Vassimon Manela, D. Errington, T. Fisher, B. van Breugel, and P. Minervini, “Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume.   Online: Association for Computational Linguistics, Apr. 2021, pp. 2232–2242. [Online]. Available: https://aclanthology.org/2021.eacl-main.190
  15. S. Bordia and S. R. Bowman, “Identifying and reducing gender bias in word-level language models,” arXiv preprint arXiv:1904.03035, 2019.
  16. Y. Tal, I. Magar, and R. Schwartz, “Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias,” Jun. 2022, arXiv:2206.09860 [cs]. [Online]. Available: http://arxiv.org/abs/2206.09860
  17. T. Sun, A. Gaut, S. Tang, Y. Huang, M. ElSherief, J. Zhao, D. Mirza, E. Belding, K.-W. Chang, and W. Y. Wang, “Mitigating Gender Bias in Natural Language Processing: Literature Review,” Jun. 2019, arXiv:1906.08976 [cs]. [Online]. Available: http://arxiv.org/abs/1906.08976
  18. S. Prabhumoye, Y. Tsvetkov, R. Salakhutdinov, and A. W. Black, “Style transfer through back-translation,” arXiv preprint arXiv:1804.09000, 2018.
  19. Mansour Sami, Ashkan Sami, and Peter J Barclay, “A case study of Fairness in generated images of Large Language Models for Software Engineering tasks.” in Proceedings of the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023)., Bogotá, Colombia, Oct. 2023.
  20. Z. Masood, R. Hoda, K. Blincoe, and D. Damian, “Like, dislike, or just do it? How developers approach software development tasks,” Information and Software Technology, vol. 150, p. 106963, 2022, publisher: Elsevier.
  21. “Wiktionary entry for ‘he’ in English,” Jun. 2023, page Version ID: 73386372. [Online]. Available: https://en.wiktionary.org/wiki/he#English
  22. M. Perry and G. Kader, “Variation as Unalikeability,” Teaching Statistics, vol. 27, no. 2, pp. 58–60, 2005. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9639.2005.00210.x
  23. G. D. Kader and M. Perry, “Variability for Categorical Variables,” Journal of Statistics Education, vol. 15, no. 2, p. null, Jul. 2007. [Online]. Available: https://doi.org/10.1080/10691898.2007.11889465
  24. A. D’Andrea, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, Tools and Applications for Sentiment Analysis Implementation,” International Journal of Computer Applications, vol. 125, no. 3, pp. 26–33, Sep. 2015. [Online]. Available: http://www.ijcaonline.org/research/volume125/number3/dandrea-2015-ijca-905866.pdf
  25. J. W. Tukey, “Comparing individual means in the analysis of variance,” Biometrics, pp. 99–114, 1949.
  26. D. K. LaScotte, “Singular they: An Empirical Study of Generic Pronoun Use,” American Speech, vol. 91, no. 1, pp. 62–80, Feb. 2016. [Online]. Available: https://doi.org/10.1215/00031283-3509469
  27. C. Wohlin and A. Rainer, “Challenges and recommendations to publishing and using credible evidence in software engineering,” Information and Software Technology, vol. 134, p. 106555, Jun. 2021. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0950584921000409
  28. UK Government, “Employment by detailed occupation and industry by sex and age for Great Britain, UK and constituent countries.” [Online]. Available: https://tinyurl.com/emp-gender
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Peter J Barclay (5 papers)
  2. Ashkan Sami (8 papers)
Citations (1)