Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in Large Language Model Translations (2506.00748v1)

Published 31 May 2025 in cs.CL and cs.CY

Abstract: Addressing gender bias and maintaining logical coherence in machine translation remains challenging, particularly when translating between natural gender languages, like English, and genderless languages, such as Persian, Indonesian, and Finnish. We introduce the Translate-with-Care (TWC) dataset, comprising 3,950 challenging scenarios across six low- to mid-resource languages, to assess translation systems' performance. Our analysis of diverse technologies, including GPT-4, mBART-50, NLLB-200, and Google Translate, reveals a universal struggle in translating genderless content, resulting in gender stereotyping and reasoning errors. All models preferred masculine pronouns when gender stereotypes could influence choices. Google Translate and GPT-4 showed particularly strong bias, favoring male pronouns 4-6 times more than feminine ones in leadership and professional success contexts. Fine-tuning mBART-50 on TWC substantially resolved these biases and errors, led to strong generalization, and surpassed proprietary LLMs while remaining open-source. This work emphasizes the need for targeted approaches to gender and semantic coherence in machine translation, particularly for genderless languages, contributing to more equitable and accurate translation systems.

PDF Abstract

Addressing Gender Bias and Ambiguity in Machine Translation: A Focus on Genderless Languages

This paper explores a significant issue in machine translation (MT), namely the translation of gender-neutral content from genderless languages to natural gender languages such as English. This challenge is particularly relevant for languages that lack grammatical gender, such as Persian, Indonesian, Finnish, Turkish, Estonian, and Azerbaijani, where gender must often be inferred in translation. The paper introduces the "Translate-with-Care" (TWC) dataset, developed to evaluate how MT models handle gender and reasoning ambiguities across 3,950 scenarios in six low- to mid-resource languages.

Challenges in Machine Translation

The authors identify three core challenges in MT:

Gender Bias: Existing translation models often exhibit gender bias, typically defaulting to masculine pronouns in contexts where the gender is not explicit. This can lead to stereotyping, particularly in professional settings or roles traditionally perceived as male-dominated.
Pronoun Neutrality: Maintaining neutrality without explicit gender indicators presents a challenge, especially when the source language does not provide unequivocal gender context.
Reasoning Ambiguity: The correct translation often requires context-based reasoning, something current models find particularly challenging, resulting in significant translation errors or omissions.

Evaluation of Translation Models

The paper evaluates several translation models, including GPT-4, mBART-50, NLLB-200, and Google Translate, identifying universal challenges in translating genderless content. The findings show that all models exhibit a strong bias towards masculine pronouns, a trend particularly pronounced in GPT-4 and Google Translate, which default to masculine pronouns four to six times more frequently than feminine pronouns in contexts involving leadership or professional roles.

Fine-tuning and Results

Remarkably, the authors demonstrate that fine-tuning mBART-50 on the TWC dataset effectively mitigates these biases and errors. The fine-tuned model not only reduces gender bias but also shows strong generalization capabilities across languages, outperforming larger, proprietary models like GPT-4. The mBART-50 model, after fine-tuning, exhibited high accuracy across the dataset and showed promising results even for languages it was not explicitly trained on.

Implications and Future Directions

This work highlights the importance of targeted interventions in MT using fine-tuning to address gender bias and improve semantic coherence. By making their dataset, code, and models publicly available, the authors recommend further research into gender neutralization and reasoning in MT to extend current capabilities, particularly for under-resourced, genderless languages. Future developments in MT can draw on these insights to create more equitable and accurate systems, fostering improved inclusivity in AI-enabled communication technologies.

This paper underlines a critical point in MT research, advocating for more responsible and sensitive translation practices that consider both linguistic and societal contexts, ensuring MT advancements align with broader goals of reducing bias and enhancing fairness in automated language processing systems.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Pardis Sadat Zahraei (5 papers)
Ali Emami (36 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos