2000 character limit reached
YAD: Leveraging T5 for Improved Automatic Diacritization of Yorùbá Text (2412.20218v1)
Published 28 Dec 2024 in cs.CL
Abstract: In this work, we present Yor`ub\'a automatic diacritization (YAD) benchmark dataset for evaluating Yor`ub\'a diacritization systems. In addition, we pre-train text-to-text transformer, T5 model for Yor`ub\'a and showed that this model outperform several multilingually trained T5 models. Lastly, we showed that more data and larger models are better at diacritization for Yor`ub\'a
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.