Guylingo: The Republic of Guyana Creole Corpora (2405.03832v3)
Abstract: While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking", the ex-British Caribbean region consists of a myriad of Creole languages thriving alongside English. In this paper, we present Guylingo: a comprehensive corpus designed for advancing NLP research in the domain of Creolese (Guyanese English-lexicon Creole), the most widely spoken language in the culturally rich nation of Guyana. We first outline our framework for gathering and digitizing this diverse corpus, inclusive of colloquial expressions, idioms, and regional variations in a low-resource language. We then demonstrate the challenges of training and evaluating NLP models for machine translation in Creole. Lastly, we discuss the unique opportunities presented by recent NLP advancements for accelerating the formal adoption of Creole languages as official languages in the Caribbean.
- Accessed 2023. Guyanese creole english. Accessed on December 14, 2023.
- JamPatoisNLI: A jamaican patois natural language inference dataset. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5307–5320, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
- George N. Cave. 1970. Some sociolinguistic factors in the production of standard language in guyana and implications for the language teacher. Language Learning, 20(2):249–263.
- Raj Dabre and Aneerav Sukhoo. 2022. Kreolmorisienmt: A dataset for mauritian creole machine translation. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 22–29.
- Hubert Devonish and Dahlia Thompson. 2013. Creolese. In Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber, editors, The Survey of Pidgin and Creole Languages, Vol. I: English-based and Dutch-based languages, pages 49–60. Oxford University Press, Oxford.
- Kean Amelia Gibson. 1982. Tense and aspect in Guyanese Creole: A syntactic, semantic and pragmatic analysis. Ph.D. thesis, University of York.
- Guyanese Languages Unit. 2016. Two areas of guyanese grammar. https://guyaneselanguagesunit.com/2016/07/12/two-areas-of-guyanese-grammar/. Accessed on December 14, 2023.
- The gulf of guinea creole corpora.
- The Gulf of Guinea creole corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 523–529, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Helen Patuck. 2020. My Hero is You: How Kids Can Fight COVID-19! https://www.unicef.org/coronavirus/my-hero-you. Accessed on: Insert Date Accessed.
- Challenges and strategies in cross-cultural nlp.
- CMU Haitian Creole-English translation system for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 386–392, Edinburgh, Scotland. Association for Computational Linguistics.
- David J Holbrook and Holly A Holbrook. 2001. Guyanese creole survey report. https://www.sil.org/resources/archives/9001.
- On language models for creoles. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 58–71, Online. Association for Computational Linguistics.
- Ancestor-to-creole transfer is not a walk in the park. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 68–74, Dublin, Ireland. Association for Computational Linguistics.
- Ancestor-to-creole transfer is not a walk in the park. arXiv preprint arXiv:2206.04371.
- What a creole wants, what a creole needs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6439–6449, Marseille, France. European Language Resources Association.
- Letters from Guyana. 2017. Me na able - creolese 101. https://lettersfromguyana.wordpress.com/2017/01/29/me-na-able-creolese-101/. Accessed on December 14, 2023.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Singlish message paraphrasing: A joint task of creole translation and text normalization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3924–3936.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
- APiCS Online. Max Planck Institute for Evolutionary Anthropology, Leipzig.
- OpenAI. 2023. Gpt-4 technical report.
- Bleu: a method for automatic evaluation of machine translation. pages 311–318.
- Polyglot Club. Accessed 2023. Guyanese creole english vocabulary - basic words. https://polyglotclub.com/wiki/Language/Guyanese-creole-english/Vocabulary/Basic-words. Accessed on December 14, 2023.
- Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer.
- John R Rickford. 1987. Dimensions of a creole continuum: History, texts, and linguistic analysis of Guyanese Creole. Stanford University Press.
- Jack Sidnell. 1999. Gender and pronominal variation in an indo-guyanese creole-speaking community. Language in Society, 28(3):367–399.
- Jack Sidnell. 2002. Habitual and imperfective in guyanese creole. Journal of pidgin and creole languages, 17(2):151–189.
- James Speirs. 1902. The Proverbs of British Guiana. With an Index of Principal Words, an Index of Subjects, and a Glossary. The Argosy Company, Demerara.
- Travel Phrases. Guyanese phrases and basics. http://www.travelphrases.info/languages/guyanese.htm. Accessed on December 14, 2023.
- Wikipedia. Accessed 2023. Guyanese creole. https://en.wikipedia.org/wiki/Guyanese_Creole. Accessed on December 14, 2023.
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization.
- Christopher Clarke (13 papers)
- Roland Daynauth (6 papers)
- Charlene Wilkinson (1 paper)
- Hubert Devonish (1 paper)
- Jason Mars (21 papers)