How good are Large Language Models on African Languages? (2311.07978v2)

Published 14 Nov 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Recent advancements in natural language processing have led to the proliferation of LLMs. These models have been shown to yield good performance, using in-context learning, even on tasks and languages they are not trained on. However, their performance on African languages is largely understudied relative to high-resource languages. We present an analysis of four popular LLMs (mT0, Aya, LLaMa 2, and GPT-4) on six tasks (topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition) across 60 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce lower performance for African languages, and there is a large gap in performance compared to high-resource languages (such as English) for most tasks. We find that GPT-4 has an average to good performance on classification tasks, yet its performance on generative tasks such as machine translation and summarization is significantly lacking. Surprisingly, we find that mT0 had the best overall performance for cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Similarly, we find the recent Aya model to have comparable result to mT0 in almost all tasks except for topic classification where it outperform mT0. Overall, LLaMa 2 showed the worst performance, which we believe is due to its English and code-centric~(around 98%) pre-training corpus. Our findings confirms that performance on African languages continues to remain a hurdle for the current LLMs, underscoring the need for additional efforts to close this gap.

PDF Abstract

Evaluative Overview of the Performance of LLMs in African Languages

Recent advancements in LLMs have significantly improved the capabilities of these models to perform in-context learning across various tasks and languages. However, the performance of LLMs on African languages has not been as extensively studied as that on high-resource languages. The paper under discussion provides an analytical overview of the capabilities of four popular LLMs—mT0, Aya, LLaMa 2, and GPT-4—across a diverse set of tasks, specifically for African languages, which encompass various language families and geographical regions.

Performance Analysis across Tasks and Languages

The paper evaluates the models on six distinct tasks: topic classification, sentiment classification, machine translation, summarization, question answering, and named entity recognition, across a total of 60 African languages. The results indicate a notable disparity in performance between African languages and high-resource languages, demonstrating a persistent gap that suggests additional research and development efforts are needed for low-resource LLMs.

According to the findings, African languages show lower performance overall, particularly in generative tasks such as machine translation and summarization. Specifically, GPT-4 displays average to good performance on classification tasks, while its capabilities lag significantly on generative tasks. Interestingly, the mT0 model outperformed both GPT-4 and fine-tuned mT5 models in cross-lingual question answering for African languages. Similarly, the recently introduced Aya model presented comparable results to mT0 in most tasks, with notable superiority in topic classification. Conversely, LLaMa 2 consistently demonstrated the weakest performance, likely attributed to its predominantly English and code-centric pre-training corpus.

Implications and Future Directions

The paper highlights actionable insights and implications for further studies and developments in artificial intelligence within the context of African linguistics. The practical implications call for concerted efforts in improving LLMs tuned to African languages by incorporating them more prominently in pre-training datasets. This approach could potentially alleviate the skill gap relative to high-resource LLMs.

Theoretical implications include exploring new methodologies for instruction fine-tuning and multitask learning that better cater to African languages. Future research could explore customizing LLMs to be more inclusive of diverse dialects and linguistic nuances native to the African continent.

In conclusion, the research underscores the need for a more inclusive approach in the development and evaluation of LLMs, with a focus on overcoming challenges posed by African languages. This necessitates ongoing evaluations and iterative improvements to bridge existing performance gaps, ensuring these models benefit users across all linguistic communities. The paper advocates for advancing AI models for African languages, providing a foundation for future initiatives and research endeavors in this domain.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Jessica Ojo (6 papers)
Kelechi Ogueji (14 papers)
Pontus Stenetorp (68 papers)
David Ifeoluwa Adelani (59 papers)

Citations (14)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/JessicaOjo__/status/1923480650268516703

https://twitter.com/davlanade/status/1775874020753224100