Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition (2210.12391v2)

Published 22 Oct 2022 in cs.CL

Abstract: African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (45)
  1. David Ifeoluwa Adelani (59 papers)
  2. Graham Neubig (342 papers)
  3. Sebastian Ruder (93 papers)
  4. Shruti Rijhwani (25 papers)
  5. Michael Beukman (19 papers)
  6. Chester Palen-Michel (9 papers)
  7. Constantine Lignos (19 papers)
  8. Jesujoba O. Alabi (20 papers)
  9. Shamsuddeen H. Muhammad (2 papers)
  10. Peter Nabende (5 papers)
  11. Cheikh M. Bamba Dione (2 papers)
  12. Andiswa Bukula (8 papers)
  13. Rooweither Mabuya (7 papers)
  14. Bonaventure F. P. Dossou (30 papers)
  15. Blessing Sibanda (8 papers)
  16. Happy Buzaaba (9 papers)
  17. Jonathan Mukiibi (10 papers)
  18. Godson Kalipe (4 papers)
  19. Derguene Mbaye (8 papers)
  20. Amelia Taylor (14 papers)
Citations (41)
Youtube Logo Streamline Icon: https://streamlinehq.com