AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing (2306.06800v1)

Published 11 Jun 2023 in cs.CL

Abstract: Developing monolingual large Pre-trained LLMs (PLMs) is shown to be very successful in handling different tasks in NLP. In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (56)

Authors (12)

Asaad Alghamdi (2 papers)
Xinyu Duan (15 papers)
Wei Jiang (341 papers)
Zhenhai Wang (14 papers)
Yimeng Wu (8 papers)
Qingrong Xia (13 papers)
Zhefeng Wang (39 papers)
Yi Zheng (165 papers)
Mehdi Rezagholizadeh (78 papers)
Baoxing Huai (28 papers)
Peilun Cheng (1 paper)
Abbas Ghaddar (18 papers)

Citations (6)

View on Semantic Scholar

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing (2306.06800v1)

Related Papers