Addressing Gender Bias and Ambiguity in Machine Translation: A Focus on Genderless Languages
This paper explores a significant issue in machine translation (MT), namely the translation of gender-neutral content from genderless languages to natural gender languages such as English. This challenge is particularly relevant for languages that lack grammatical gender, such as Persian, Indonesian, Finnish, Turkish, Estonian, and Azerbaijani, where gender must often be inferred in translation. The paper introduces the "Translate-with-Care" (TWC) dataset, developed to evaluate how MT models handle gender and reasoning ambiguities across 3,950 scenarios in six low- to mid-resource languages.
Challenges in Machine Translation
The authors identify three core challenges in MT:
- Gender Bias: Existing translation models often exhibit gender bias, typically defaulting to masculine pronouns in contexts where the gender is not explicit. This can lead to stereotyping, particularly in professional settings or roles traditionally perceived as male-dominated.
- Pronoun Neutrality: Maintaining neutrality without explicit gender indicators presents a challenge, especially when the source language does not provide unequivocal gender context.
- Reasoning Ambiguity: The correct translation often requires context-based reasoning, something current models find particularly challenging, resulting in significant translation errors or omissions.
Evaluation of Translation Models
The paper evaluates several translation models, including GPT-4, mBART-50, NLLB-200, and Google Translate, identifying universal challenges in translating genderless content. The findings show that all models exhibit a strong bias towards masculine pronouns, a trend particularly pronounced in GPT-4 and Google Translate, which default to masculine pronouns four to six times more frequently than feminine pronouns in contexts involving leadership or professional roles.
Fine-tuning and Results
Remarkably, the authors demonstrate that fine-tuning mBART-50 on the TWC dataset effectively mitigates these biases and errors. The fine-tuned model not only reduces gender bias but also shows strong generalization capabilities across languages, outperforming larger, proprietary models like GPT-4. The mBART-50 model, after fine-tuning, exhibited high accuracy across the dataset and showed promising results even for languages it was not explicitly trained on.
Implications and Future Directions
This work highlights the importance of targeted interventions in MT using fine-tuning to address gender bias and improve semantic coherence. By making their dataset, code, and models publicly available, the authors recommend further research into gender neutralization and reasoning in MT to extend current capabilities, particularly for under-resourced, genderless languages. Future developments in MT can draw on these insights to create more equitable and accurate systems, fostering improved inclusivity in AI-enabled communication technologies.
This paper underlines a critical point in MT research, advocating for more responsible and sensitive translation practices that consider both linguistic and societal contexts, ensuring MT advancements align with broader goals of reducing bias and enhancing fairness in automated language processing systems.