Papers
Topics
Authors
Recent
2000 character limit reached

HausaMT v1.0: Towards English-Hausa Neural Machine Translation

Published 9 Jun 2020 in cs.CL and cs.LG | (2006.05014v2)

Abstract: Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English-Hausa machine translation, which is considered a task for low-resource language. The Hausa language is the second largest Afro-Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa-English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder-decoder architecture with two tokenization approaches: standard word-level tokenization and Byte Pair Encoding (BPE) subword tokenization.

Citations (6)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 3 likes about this paper.