Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

$\varepsilon$ KÚ <MASK>: Integrating Yorùbá cultural greetings into machine translation (2303.17972v2)

Published 31 Mar 2023 in cs.CL

Abstract: This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yor`ub\'a greetings ($\varepsilon$ k\'u [MASK]), which are a big part of Yor`ub\'a language and culture, into English. To evaluate these models, we present IkiniYor`ub\'a, a Yor`ub\'a-English translation dataset containing some Yor`ub\'a greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google and NLLB and show that these models struggle to accurately translate Yor`ub\'a greetings into English. In addition, we trained a Yor`ub\'a-English model by finetuning an existing NMT model on the training split of IkiniYor`ub\'a and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Idris Akinade (3 papers)
  2. Jesujoba Alabi (11 papers)
  3. David Adelani (7 papers)
  4. Clement Odoje (1 paper)
  5. Dietrich Klakow (114 papers)
Citations (7)