The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs (2407.18786v1)
Abstract: This paper studies gender bias in machine translation through the lens of LLMs. Four widely-used test sets are employed to benchmark various base LLMs, comparing their translation quality and gender bias against state-of-the-art Neural Machine Translation (NMT) models for English to Catalan (En $\rightarrow$ Ca) and English to Spanish (En $\rightarrow$ Es) translation directions. Our findings reveal pervasive gender bias across all models, with base LLMs exhibiting a higher degree of bias compared to NMT models. To combat this bias, we explore prompting engineering techniques applied to an instruction-tuned LLM. We identify a prompt structure that significantly reduces gender bias by up to 12% on the WinoMT evaluation dataset compared to more straightforward prompts. These results significantly reduce the gender bias accuracy gap between LLMs and traditional NMT systems.
- Aleix Sant (2 papers)
- Carlos Escolano (20 papers)
- Audrey Mash (3 papers)
- Francesca De Luca Fornaciari (3 papers)
- Maite Melero (9 papers)