Investigating Markers and Drivers of Gender Bias in Machine Translations (2403.11896v2)
Abstract: Implicit gender bias in LLMs is a well-documented problem, and implications of gender introduced into automatic translations can perpetuate real-world biases. However, some LLMs use heuristics or post-processing to mask such bias, making investigation difficult. Here, we examine bias in LLMs' via back-translation, using the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks used in a previous study. Each statement starts with 'she', and is translated first into a 'genderless' intermediate language then back into English; we then examine pronoun-choice in the back-translated texts. We expand prior research in the following ways: (1) by comparing results across five intermediate languages, namely Finnish, Indonesian, Estonian, Turkish and Hungarian; (2) by proposing a novel metric for assessing the variation in gender implied in the repeated translations, avoiding the over-interpretation of individual pronouns, apparent in earlier work; (3) by investigating sentence features that drive bias; (4) and by comparing results from three time-lapsed datasets to establish the reproducibility of the approach. We found that some languages display similar patterns of pronoun use, falling into three loose groups, but that patterns vary between groups; this underlines the need to work with multiple languages. We also identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations. Moreover, we see a good level of replicability in the results, and establish that our variation metric proves robust despite an obvious change in the behaviour of the DeepL translation API during the course of the study. These results show that the back-translation method can provide further insights into bias in LLMs.
- W. Phillips and L. Boroditsky, “Can quirks of grammar affect the way you think? Grammatical gender and object concepts,” in Proceedings of the annual meeting of the cognitive science society, vol. 25, 2003, issue: 25.
- E. K. Koerner, “The Sapir-Whorf hypothesis: A preliminary history and a bibliographical essay,” Journal of Linguistic Anthropology, vol. 2, no. 2, pp. 173–198, 1992, publisher: Wiley Online Library.
- C. Treude and H. Hata, “She Elicits Requirements and He Tests: Software Engineering Gender Bias in Large Language Models,” in Proceedings of The 20th International Conference on Mining Software Repositories (MSR 2023). Melbourne, Australia: IEEE/ACM, May 2023, pp. 624–629.
- E. Esperança-Rodier and D. Frankowski, “DeepL vs Google Translate: who’s the best at translating MWEs from French into Polish? a multidisciplinary approach to corpora creation and quality translation of MWEs,” in Translating and the Computer 43, Asling, 2021.
- A. Yulianto and R. Supriatnaningsih, “Google Translate vs. DeepL: a quantitative evaluation of close-language pair translation (French to English),” AJELP: Asian Journal of English Language and Pedagogy, vol. 9, no. 2, pp. 109–127, 2021.
- J. Woetzel, A. Madgavkar, and K. Ellingrud, “The power of parity,” McKinsey & Company, Tech. Rep., 2015.
- Y. Wang and D. Redmiles, “Implicit gender biases in professional software development: An empirical study,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). IEEE, 2019, pp. 1–10.
- N. Imtiaz, J. Middleton, J. Chakraborty, N. Robson, G. Bai, and E. Murphy-Hill, “Investigating the effects of gender bias on GitHub,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 700–711.
- T. Crick, T. Prickett, J. Bradnum, and A. Godfrey, “Gender parity in peer assessment of team software development projects,” in Proceedings of 6th Conference on Computing Education Practice, 2022, pp. 9–12.
- R. Garcia, C.-J. Liao, A. Pearce, and C. Treude, “Gender Influence on Communication Initiated within Student Teams,” in Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1, 2022, pp. 432–438.
- J. Terrell, A. Kofink, J. Middleton, C. Rainear, E. Murphy-Hill, C. Parnin, and J. Stallings, “Gender differences and bias in open source: Pull request acceptance of women versus men,” PeerJ Computer Science, vol. 3, p. e111, 2017, publisher: PeerJ Inc.
- M. P. Robillard, “Turnover-induced knowledge loss in practice,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 1292–1302.
- S. A. Piazzolla, B. Savoldi, and L. Bentivogli, “Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems,” Jun. 2023, arXiv:2306.05882 [cs]. [Online]. Available: http://arxiv.org/abs/2306.05882
- D. de Vassimon Manela, D. Errington, T. Fisher, B. van Breugel, and P. Minervini, “Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics, Apr. 2021, pp. 2232–2242. [Online]. Available: https://aclanthology.org/2021.eacl-main.190
- S. Bordia and S. R. Bowman, “Identifying and reducing gender bias in word-level language models,” arXiv preprint arXiv:1904.03035, 2019.
- Y. Tal, I. Magar, and R. Schwartz, “Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias,” Jun. 2022, arXiv:2206.09860 [cs]. [Online]. Available: http://arxiv.org/abs/2206.09860
- T. Sun, A. Gaut, S. Tang, Y. Huang, M. ElSherief, J. Zhao, D. Mirza, E. Belding, K.-W. Chang, and W. Y. Wang, “Mitigating Gender Bias in Natural Language Processing: Literature Review,” Jun. 2019, arXiv:1906.08976 [cs]. [Online]. Available: http://arxiv.org/abs/1906.08976
- S. Prabhumoye, Y. Tsvetkov, R. Salakhutdinov, and A. W. Black, “Style transfer through back-translation,” arXiv preprint arXiv:1804.09000, 2018.
- Mansour Sami, Ashkan Sami, and Peter J Barclay, “A case study of Fairness in generated images of Large Language Models for Software Engineering tasks.” in Proceedings of the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023)., Bogotá, Colombia, Oct. 2023.
- Z. Masood, R. Hoda, K. Blincoe, and D. Damian, “Like, dislike, or just do it? How developers approach software development tasks,” Information and Software Technology, vol. 150, p. 106963, 2022, publisher: Elsevier.
- “Wiktionary entry for ‘he’ in English,” Jun. 2023, page Version ID: 73386372. [Online]. Available: https://en.wiktionary.org/wiki/he#English
- M. Perry and G. Kader, “Variation as Unalikeability,” Teaching Statistics, vol. 27, no. 2, pp. 58–60, 2005. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9639.2005.00210.x
- G. D. Kader and M. Perry, “Variability for Categorical Variables,” Journal of Statistics Education, vol. 15, no. 2, p. null, Jul. 2007. [Online]. Available: https://doi.org/10.1080/10691898.2007.11889465
- A. D’Andrea, F. Ferri, P. Grifoni, and T. Guzzo, “Approaches, Tools and Applications for Sentiment Analysis Implementation,” International Journal of Computer Applications, vol. 125, no. 3, pp. 26–33, Sep. 2015. [Online]. Available: http://www.ijcaonline.org/research/volume125/number3/dandrea-2015-ijca-905866.pdf
- J. W. Tukey, “Comparing individual means in the analysis of variance,” Biometrics, pp. 99–114, 1949.
- D. K. LaScotte, “Singular they: An Empirical Study of Generic Pronoun Use,” American Speech, vol. 91, no. 1, pp. 62–80, Feb. 2016. [Online]. Available: https://doi.org/10.1215/00031283-3509469
- C. Wohlin and A. Rainer, “Challenges and recommendations to publishing and using credible evidence in software engineering,” Information and Software Technology, vol. 134, p. 106555, Jun. 2021. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0950584921000409
- UK Government, “Employment by detailed occupation and industry by sex and age for Great Britain, UK and constituent countries.” [Online]. Available: https://tinyurl.com/emp-gender
- Peter J Barclay (5 papers)
- Ashkan Sami (8 papers)