Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation (2401.04972v2)

Published 10 Jan 2024 in cs.CL

Abstract: Machine translation often suffers from biased data and algorithms that can lead to unacceptable errors in system output. While bias in gender norms has been investigated, less is known about whether MT systems encode bias about social relationships, e.g., "the lawyer kissed her wife." We investigate the degree of bias against same-gender relationships in MT systems, using generated template sentences drawn from several noun-gender languages (e.g., Spanish) and comprised of popular occupation nouns. We find that three popular MT services consistently fail to accurately translate sentences concerning relationships between entities of the same gender. The error rate varies considerably based on the context, and same-gender sentences referencing high female-representation occupations are translated with lower accuracy. We provide this work as a case study in the evaluation of intrinsic bias in NLP systems with respect to social relationships.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Angeliki Alvanoudi. 2014. Grammatical gender in interaction: Cultural and cognitive aspects. Brill.
  2. Amazon. 2023. Amazon Translate.
  3. America’s changing attitudes toward homosexuality, civil unions, and same-gender marriage: 1977–2004. Social work, 52(1):71–79.
  4. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476.
  5. BLS. 2023. Labor force statistics from the current population survey.
  6. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779.
  7. S Burgen. 2020. Masculine, feminist or neutral? The language battle that has split Spain. The Guardian.
  8. Ten social dimensions of conversations and relationships. In Proceedings of The Web Conference 2020, pages 1514–1525.
  9. DOL. 2023. Employment and earnings by occupation.
  10. Modeling fluency and faithfulness for diverse neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 59–66.
  11. A Survey of Race, Racism, and Anti-Racism in NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1905–1925.
  12. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16):E3635–E3644.
  13. Hila Gonen and Yoav Goldberg. 2019. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 609–614.
  14. Google. 2023. Google Translate.
  15. Does gender-fair language pay off? the social perception of professions from a cross-linguistic perspective. Frontiers in psychology, 6:2018.
  16. Dirk Hovy and Diyi Yang. 2021. The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588–602.
  17. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5491–5501.
  18. Jeffrey M Jones. 2021. LGBT identification rises to 5.6% in latest US estimate. Gallup News, 24.
  19. Gender Coreference and Bias Evaluation at WMT 2020. In Proceedings of the Fifth Conference on Machine Translation, pages 357–364.
  20. Vinodh Krishnan and Jacob Eisenstein. 2015. “You’re Mr. Lebowski, I’m the Dude”: Inducing Address Term Formality in Signed Social Networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1616–1626.
  21. Improving both domain robustness and domain adaptability in machine translation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5191–5204, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  22. Caroline Lipovsky. 2014. Gender-specification and occupational nouns: has linguistic change occurred in job advertisements since the French feminisation reforms? Gender & Language, 8(3).
  23. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
  24. Helge Lødrup. 2010. Implicit possessives and reflexive binding in norwegian. Transactions of the Philological Society, 108(2):89–109.
  25. Microsoft. 2023. Azure AI Translator.
  26. Cultural variation in communal versus exchange norms: Implications for social support. Journal of Personality and Social Psychology, 113(1):81.
  27. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  28. On human intellect and machine failures: Troubleshooting integrative machine learning systems. In Thirty-First AAAI Conference on Artificial Intelligence.
  29. Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia. In Proceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 479–490.
  30. Jacob Poushter and Nicholas Kent. 2020. The global divide on homosexuality persists. Pew Research Center, 25.
  31. Predicting overt display of power in written dialogs. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 518–522.
  32. Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14.
  33. Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9:845–874.
  34. Language left behind on social media exposes the emotional and cognitive costs of a romantic breakup. Proceedings of the National Academy of Sciences, 118(7).
  35. CH-Wang Sky and David Jurgens. 2021. Using sociolinguistic variables to reveal changing attitudes towards sexuality and gender. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9918–9938.
  36. Sarah A Soule. 2004. Going to the chapel? Same-sex marriage bans in the United States, 1973–2000. Social problems, 51(4):453–477.
  37. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.
  38. Jonas-Dario Troles and Ute Schmid. 2021. Extending Challenge Sets to Uncover Gender Bias in Machine Translation: Impact of Stereotypical Verbs and Adjectives. In Proceedings of the Sixth Conference on Machine Translation, pages 531–541.
  39. What’s age got to do with it? Partner age difference, power, intimate partner violence, and sexual risk in urban adolescents. Journal of interpersonal violence, 28(10):2068–2087.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ian Stewart (34 papers)
  2. Rada Mihalcea (131 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets