ArTST: Arabic Text and Speech Transformer (2310.16621v1)

Published 25 Oct 2023 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Hawau Olamide Toyin (11 papers)
Amirbek Djanibekov (7 papers)
Ajinkya Kulkarni (18 papers)
Hanan Aldarmaki (29 papers)

Citations (5)

View on Semantic Scholar

ArTST: Arabic Text and Speech Transformer (2310.16621v1)

Related Papers