Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings (2103.04225v3)

Published 7 Mar 2021 in cs.CL, cs.AI, and cs.LG

Abstract: Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation. When translating into English which marks (in)definiteness morphologically, from Yor`ub\'a which uses bare nouns but marks these features contextually, ambiguities arise. In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yor`ub\'a into English. We investigate how the systems what extent they identify BNs, correctly translate them, and compare with human translation patterns. We also analyze the type of errors each model makes and provide a linguistic description of these errors. We glean insights for evaluating model performance in low-resource settings. In translating bare nouns, our results show the transformer model outperforms the SMT and BiLSTM models for 4 categories, the BiLSTM outperforms the SMT model for 3 categories while the SMT outperforms the NMT models for 1 category.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ife Adebara (12 papers)
  2. Muhammad Abdul-Mageed (102 papers)
  3. Miikka Silfverberg (16 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.