Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages (2204.08083v2)

Published 17 Apr 2022 in cs.CL

Abstract: Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Tosin Adewumi (27 papers)
  2. Mofetoluwa Adeyemi (8 papers)
  3. Aremu Anuoluwapo (3 papers)
  4. Bukola Peters (1 paper)
  5. Happy Buzaaba (9 papers)
  6. Oyerinde Samuel (1 paper)
  7. Amina Mardiyyah Rufai (2 papers)
  8. Benjamin Ajibade (5 papers)
  9. Tajudeen Gwadabe (1 paper)
  10. Mory Moussou Koulibaly Traore (1 paper)
  11. Tunde Ajayi (2 papers)
  12. Shamsuddeen Muhammad (4 papers)
  13. Ahmed Baruwa (5 papers)
  14. Paul Owoicho (4 papers)
  15. Phylis Ngigi (1 paper)
  16. Orevaoghene Ahia (23 papers)
  17. Ruqayya Nasir (1 paper)
  18. Foteini Liwicki (16 papers)
  19. Marcus Liwicki (86 papers)
  20. Tolulope Ogunremi (5 papers)
Citations (1)