AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages (2204.08083v2)
Abstract: Dialogue generation is an important NLP task fraught with many challenges. The challenges become more daunting for low-resource African languages. To enable the creation of dialogue agents for African languages, we contribute the first high-quality dialogue datasets for 6 African languages: Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yor`ub\'a. These datasets consist of 1,500 turns each, which we translate from a portion of the English multi-domain MultiWOZ dataset. Subsequently, we investigate & analyze the effectiveness of modelling through transfer learning by utilziing state-of-the-art (SoTA) deep monolingual models: DialoGPT and BlenderBot. We compare the models with a simple seq2seq baseline using perplexity. Besides this, we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). We find that the hypothesis that deep monolingual models learn some abstractions that generalize across languages holds. We observe human-like conversations, to different degrees, in 5 out of the 6 languages. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. We freely provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.
- Tosin Adewumi (27 papers)
- Mofetoluwa Adeyemi (8 papers)
- Aremu Anuoluwapo (3 papers)
- Bukola Peters (1 paper)
- Happy Buzaaba (9 papers)
- Oyerinde Samuel (1 paper)
- Amina Mardiyyah Rufai (2 papers)
- Benjamin Ajibade (5 papers)
- Tajudeen Gwadabe (1 paper)
- Mory Moussou Koulibaly Traore (1 paper)
- Tunde Ajayi (2 papers)
- Shamsuddeen Muhammad (4 papers)
- Ahmed Baruwa (5 papers)
- Paul Owoicho (4 papers)
- Phylis Ngigi (1 paper)
- Orevaoghene Ahia (23 papers)
- Ruqayya Nasir (1 paper)
- Foteini Liwicki (16 papers)
- Marcus Liwicki (86 papers)
- Tolulope Ogunremi (5 papers)