Gender-specific Machine Translation with Large Language Models (2309.03175v2)
Abstract: While machine translation (MT) systems have seen significant improvements, it is still common for translations to reflect societal biases, such as gender bias. Decoder-only LLMs have demonstrated potential in MT, albeit with performance slightly lagging behind traditional encoder-decoder Neural Machine Translation (NMT) systems. However, LLMs offer a unique advantage: the ability to control the properties of the output through prompts. In this study, we leverage this flexibility to explore LLaMa's capability to produce gender-specific translations. Our results indicate that LLaMa can generate gender-specific translations with translation accuracy and gender bias comparable to NLLB, a state-of-the-art multilingual NMT system. Furthermore, our experiments reveal that LLaMa's gender-specific translations rely on coreference resolution to determine gender, showing higher gender variance in gender-ambiguous datasets but maintaining consistency in less ambiguous contexts. This research investigates the potential and challenges of using LLMs for gender-specific translations as an instance of the controllability of outputs offered by LLMs.
- In-context examples selection for machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873, Toronto, Canada. Association for Computational Linguistics.
- Rachel Bawden and François Yvon. 2023. Investigating the translation performance of a large multilingual language model: the case of bloom.
- BLASER: A text-free speech-to-speech translation evaluation metric. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9064–9079, Toronto, Canada. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways.
- Multilingual holistic bias: Extending descriptors and patterns to unveil demographic biases in languages at scale.
- Interpreting gender bias in neural machine translation: Multilingual architecture matters. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11):11855–11863.
- Joel Escudé Font and Marta R. Costa-jussà. 2019. Equalizing gender bias in neural machine translation with word embeddings techniques. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 147–154, Florence, Italy. Association for Computational Linguistics.
- Eve Fleisig and Christiane Fellbaum. 2022. Mitigating gender bias in machine translation through adversarial learning.
- The unreasonable effectiveness of few-shot learning for machine translation.
- The flores-101 evaluation benchmark for low-resource and multilingual machine translation.
- Hallucinations in large multilingual translation models.
- Two new evaluation datasets for low-resource machine translation: Nepali-english and sinhala-english.
- How good are gpt models at machine translation? a comprehensive evaluation.
- Is chatgpt a good translator? yes with gpt-4 as the engine.
- Melvin Johnson. 2020. A scalable approach to reducing gender bias in google translate. Accessed: September 5th, 2023.
- Findings of the 2023 conference on machine translation (WMT23): LLMs are here but not quite there yet. In Proceedings of the Eighth Conference on Machine Translation, pages 1–42, Singapore. Association for Computational Linguistics.
- Collecting a large-scale gender bias dataset for coreference resolution and machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2470–2480, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Michal Měchura. 2022. A taxonomy of bias-causing ambiguities in machine translation. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 168–173, Seattle, Washington. Association for Computational Linguistics.
- Adaptive machine translation with large language models.
- No language left behind: Scaling human-centered machine translation.
- Interactive-chain-prompting: Ambiguity resolution for crosslingual conditional generation with interaction.
- Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
- Assessing gender bias in machine translation - A case study with google translate. CoRR, abs/1809.02208.
- Comet: A neural framework for mt evaluation.
- Danielle Saunders and Bill Byrne. 2020. Reducing gender bias in neural machine translation as a domain adaptation problem. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7724–7736, Online. Association for Computational Linguistics.
- BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
- “I’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9180–9211, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Mitigating gender bias in machine translation with target gender annotations. In Proceedings of the Fifth Conference on Machine Translation, pages 629–638, Online. Association for Computational Linguistics.
- Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models.
- Measuring and mitigating name biases in neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2576–2590, Dublin, Ireland. Association for Computational Linguistics.
- Prompting large language model for machine translation: A case study.
- Multilingual machine translation with large language models: Empirical results and analysis.
- Pierre Andrews (13 papers)
- Pontus Stenetorp (68 papers)
- Mikel Artetxe (52 papers)
- Marta R. Costa-jussà (73 papers)
- Eduardo Sánchez (8 papers)