MiTTenS: A Dataset for Evaluating Gender Mistranslation (2401.06935v3)
Abstract: Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslation, and such errors can be especially harmful. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally under-represented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both neural machine translation systems and foundation models, and show that all systems exhibit gender mistranslation and potential harm, even in high resource languages.
- An in-depth look at gemini’s language abilities.
- The Arabic parallel gender corpus 2.0: Extensions and analyses. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1870–1884, Marseille, France. European Language Resources Association.
- Palm 2 technical report.
- Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6923–6933, Online. Association for Computational Linguistics.
- Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
- Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1004–1015, Online. Association for Computational Linguistics.
- Yang Trista Cao and Hal Daumé III. 2020. Toward gender-inclusive coreference resolution. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4568–4595, Online. Association for Computational Linguistics.
- On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 173–181, Florence, Italy. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways.
- Scaling instruction-finetuned language models.
- Marta R Costa-jussà. 2019. An analysis of gender bias studies in natural language processing. Nature Machine Intelligence, 1(11):495–496.
- Multilingual holistic bias: Extending descriptors and patterns to unveil demographic biases in languages at scale. arXiv preprint arXiv:2305.13198.
- Toxicity in multilingual machine translation at scale.
- Harms of gender exclusivity and challenges in non-binary representation in language technologies.
- Gemini Team Google. 2023. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Intrinsic bias metrics do not correlate with application bias.
- Misgendered: Limits of large language models in understanding pronouns.
- Melvin Johnson. 2018. Providing gender-specific translations in google translate.
- Melvin Johnson. 2020. A scalable approach to reducing gender bias in google translate.
- Yennie Jun. 2023. Lost in dall-e 3 translation.
- Os Keyes. 2018. The misgendering machines: Trans/hci implications of automatic gender recognition. Proc. ACM Hum.-Comput. Interact., 2(CSCW).
- Dynabench: Rethinking benchmarking in nlp.
- Jack Krawczyk. 2023. Bard’s latest update: more features, languages and countries.
- What about “em”? how commercial machine translation fails to handle (neo-)pronouns. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 377–392, Toronto, Canada. Association for Computational Linguistics.
- Chelsea Lee. 2019. Welcome, singular "they". https://apastyle.apa.org/blog/singular-they. Accessed: 2022-11-18.
- Gpt-4 technical report.
- Training language models to follow instructions with human feedback.
- Data cards: Purposeful and transparent dataset documentation for responsible ai.
- Gender bias amplification during speed-quality optimization in neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 99–109, Online. Association for Computational Linguistics.
- Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orleans, Louisiana. Association for Computational Linguistics.
- Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9:845–874.
- Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction.
- Pushpdeep Singh. 2023a. Don’t overlook the grammatical gender: Bias evaluation for hindi-english machine translation.
- Pushpdeep Singh. 2023b. Gender inflected or bias inflicted: On using grammatical gender cues for bias evaluation in machine translation.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
- Karolina Stanczak and Isabelle Augenstein. 2021. A survey on gender bias in natural language processing.
- Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.
- Romina Stella. 2021. A dataset for studying gender bias in translation.
- Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1630–1640, Florence, Italy. Association for Computational Linguistics.
- No language left behind: Scaling human-centered machine translation.
- Getting gender right in neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3003–3008, Brussels, Belgium. Association for Computational Linguistics.
- Ethical and social risks of harm from language models.
- Sociotechnical safety evaluation of generative ai systems.
- Rethinking benchmark and contamination for language models with rephrased samples.
- Low-resource languages jailbreak gpt-4.
- Synthbio: A case study in human-ai collaborative curation of text datasets.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Kevin Robinson (10 papers)
- Sneha Kudugunta (14 papers)
- Romina Stella (2 papers)
- Sunipa Dev (28 papers)
- Jasmijn Bastings (19 papers)