Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MasakhaNEWS: News Topic Classification for African languages (2304.09972v2)

Published 19 Apr 2023 in cs.CL

Abstract: African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several LLMs. Furthermore, we explore several alternatives to full fine-tuning of LLMs that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting LLMs (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (65)
  1. David Ifeoluwa Adelani (59 papers)
  2. Marek Masiak (2 papers)
  3. Israel Abebe Azime (16 papers)
  4. Jesujoba Alabi (11 papers)
  5. Atnafu Lambebo Tonja (27 papers)
  6. Christine Mwase (3 papers)
  7. Odunayo Ogundepo (11 papers)
  8. Bonaventure F. P. Dossou (30 papers)
  9. Akintunde Oladipo (7 papers)
  10. Doreen Nixdorf (2 papers)
  11. Chris Chinenye Emezue (15 papers)
  12. Blessing Sibanda (8 papers)
  13. Davis David (7 papers)
  14. Lolwethu Ndolela (4 papers)
  15. Jonathan Mukiibi (10 papers)
  16. Tunde Ajayi (2 papers)
  17. Tatiana Moteu (2 papers)
  18. Brian Odhiambo (1 paper)
  19. Abraham Owodunni (5 papers)
  20. Nnaemeka Obiefuna (2 papers)
Citations (18)