The Ghanaian NLP Landscape: A First Look (2405.06818v1)
Abstract: Despite comprising one-third of global languages, African languages are critically underrepresented in AI, threatening linguistic diversity and cultural heritage. Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of NLP research focused on Ghanaian languages, identifying methodologies, datasets, and techniques employed. Additionally, we create a detailed roadmap outlining challenges, best practices, and future directions, aiming to improve accessibility for researchers. This work serves as a foundational resource for Ghanaian NLP research and underscores the critical need for integrating global linguistic diversity into AI development.
- Kingsley Nketia Acheampong and Nathaniel Nii Oku Sackey. 2021. Language Revitalization: A Benchmark for Akan-to-English Machine Translation. In Intelligent Systems and Applications, Advances in Intelligent Systems and Computing, pages 231–244, Cham. Springer International Publishing.
- Gordon Senanu Kwame Adika. 2012. English in Ghana: Growth, Tensions, and Trends. International Journal of Language, Translation and Intercultural Communication, 1:151–166.
- Twi Corpus: A Massively Twi-to-Handful Languages Parallel Bible Corpus. In 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pages 1043–1049.
- Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation. Computational Intelligence and Neuroscience, 2021:1–10.
- TWIENG: A Multi-Domain Twi-English Parallel Corpus for Machine Translation of Twi, a Low-Resource African Language.
- Željko Agić and Ivan Vulić. 2019. JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3204–3210, Florence, Italy. Association for Computational Linguistics.
- Akan-English: Transformer for Low Resource Translation. In 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pages 256–259. ISSN: 2576-8964.
- Jonas N. Akpanglo-Nartey and Rebecca A. Akpanglo-Nartey. 2012. Some Endangered Languages of Ghana. American Journal of Linguistics, 1(2):10–18.
- Toluwase Victor Asubiaro and Ebelechukwu Gloria Igwe. 2021. A State-of-the-Art Review of Nigerian Languages Natural Language Processing Research:. In Alice S. Etim, editor, Advances in IT Standards and Standardization Research, pages 147–167. IGI Global.
- A. Awedoba. 2009. Attitudes towards instruction in the local language – a case study of the perspectives of the ‘small stakeholder’. Research Review of the Institute of African Studies.
- NLP for Ghanaian Languages. ArXiv:2103.15475 [cs].
- English-Twi Parallel Corpus for Machine Translation. ArXiv:2103.15625 [cs].
- Contextual Text Embeddings for Twi. ArXiv:2103.15963 [cs].
- Herman Batibo. 2005. Language decline and death in Africa: Causes, consequences and challenges. Multilingual Matters.
- A statistical approach to machine translation. Computational Linguistics, 16(2):79–85.
- Tucker Childs. 2020. Language Endangerment in Africa. In Oxford Research Encyclopedia of Linguistics. Oxford University Press.
- Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing. Computational Linguistics, 49(3):613–641.
- PaLM: Scaling Language Modeling with Pathways. ArXiv:2204.02311 [cs].
- Marta R. Costa-jussà and José A. R. Fonollosa. 2015. Latest trends in hybrid machine translation and its applications. Computer Speech & Language, 32(1):3–10.
- Ernest Davis and Joseph S. Agbenyega. 2012. Language policy and instructional practice dichotomy: The case of primary schools in Ghana. International Journal of Educational Research, 53:341–347.
- On Measures of Biases and Harms in NLP. ArXiv:2108.03362 [cs].
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [cs].
- A Primer on Pretrained Multilingual Language Models. ArXiv:2107.00676 [cs].
- Ethnologue: Languages of the World.
- Candace Kaleimamoowahinekapu Galla. 2016. Indigenous Language Revitalization, Promotion, and Education: Function of Digital Technology. Computer Assisted Language Learning, 29(7):1137–1151. ERIC Number: EJ1117501.
- Ghana Statistical Service. 2023. Ghana Fact Sheet.
- LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–2232. ArXiv:1503.04069 [cs].
- Frederick Gyasi and Tim Schlippe. 2023. Twi Machine Translation. Big Data and Cognitive Computing, 7(2):114.
- Gilles Hacheme. 2021. English2Gbe: A multilingual machine translation model for {Fon/Ewe}Gbe. ArXiv:2112.11482 [cs].
- Survey of Low-Resource Machine Translation. ArXiv:2109.00486 [cs].
- Pre-trained models: Past, present and future. AI Open, 2:225–250.
- Machine learning for hybrid machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, WMT ’12, pages 312–316, USA. Association for Computational Linguistics.
- A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. ArXiv.
- A SYSTEMATIC READING IN STATISTICAL TRANSLATION: FROM THE STATISTICAL MACHINE TRANSLATION TO THE NEURAL TRANSLATION MODELS. Journal of Information and Communication Technology, 16(2):408–441.
- Multilingual Sentiment Analysis for Under-Resourced Languages: A Systematic Review of the Landscape. IEEE Access, 11:15996–16020.
- Low-resource Languages: A Review of Past Work and Future Challenges. ArXiv:2006.07264 [cs].
- C. Behan McCullagh. 2000. Bias in Historical Description, Interpretation, and Explanation. History and Theory, 39(1):39–66.
- Neural machine translation: past, present, and future. Neural Computing and Applications, 33(23):15919–15931.
- Kelechi Ogueji and Orevaoghene Ahia. 2019. PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English. ArXiv:1912.03444 [cs].
- GPT-4 Technical Report. ArXiv:2303.08774 [cs].
- Masakhane – Machine Translation For Africa. ArXiv:2003.11529 [cs].
- Kweku Osam. 2003. An introduction to the verbal and multi-verbal system of Akan.
- Charles Owu-Ewie. 2017. Language, Education and Linguistic Human Rights in Ghana. Legon Journal of the Humanities, 28(2):151–172.
- BLEU: a Method for Automatic Evaluation of Machine Translation. 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318.
- Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
- Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
- Improving Language Understanding by Generative Pre-Training.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv:1910.10683 [cs, stat].
- Neural Machine Translation for Low-resource Languages: A Survey. ACM Computing Surveys, 55(11):1–37.
- When does Bias Transfer in Transfer Learning? ArXiv:2207.02842 [cs].
- Paul Schachter and Victoria Fromkin. 1968. A Phonology of Akan: Akuapem, Asante, Fante. Technical report, Textbook Department, Student Store, University of California, Los Angeles, California 90024 ($3. ERIC Number: ED022189.
- AI4D – African Language Program. ArXiv:2104.02516 [cs].
- A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 223–231, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
- A Survey of Pretrained Language Models. In Knowledge Science, Engineering and Management, Lecture Notes in Computer Science, pages 442–456, Cham. Springer International Publishing.
- Machine Translation Using Deep Learning: A Comparison. In Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications, pages 389–395, Cham. Springer International Publishing.
- Are Pre-trained Convolutions Better than Pre-trained Transformers? ArXiv:2105.03322 [cs].
- UNESCO. 2010. Atlas of the World’s Languages in Danger. Technical report, UNESCO.
- USAID. 2020. Ghana. Language of Instruction Country Profile: Ghana. Technical report, USAID.
- Attention Is All You Need. ArXiv:1706.03762 [cs].
- Pre-Trained Language Models and Their Applications. Engineering.
- Progress in Machine Translation. Engineering, 18:143–153.
- GELR: A Bilingual Ewe-English Corpus Building and Evaluation. International Journal of Engineering Research & Technology, 10(8). Publisher: IJERT-International Journal of Engineering Research & Technology.