2000 character limit reached
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing (2306.06800v1)
Published 11 Jun 2023 in cs.CL
Abstract: Developing monolingual large Pre-trained LLMs (PLMs) is shown to be very successful in handling different tasks in NLP. In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.
- ARBERT & MARBERT: Deep bidirectional transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7088–7105, Online. Association for Computational Linguistics.
- Arabert: Transformer-based model for arabic language understanding. In LREC 2020 Workshop Language Resources and Evaluation Conference 11–16 May 2020, page 9.
- AraELECTRA: Pre-training text discriminators for Arabic language understanding. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 191–195, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
- Aragpt2: Pre-trained transformer for arabic language generation. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 196–207.
- Metro: Efficient denoising pretraining of large scale autoencoding language models with model generated signals. arXiv preprint arXiv:2204.06644.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
- Robert Dale. 2021. Gpt-3: What’s it good for? Natural Language Engineering, 27(1).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Glam: Efficient scaling of language models with mixture-of-experts.
- Arabart: a pretrained arabic sequence-to-sequence model for abstractive summarization. arXiv preprint arXiv:2203.10945.
- Using mechanical turk to create a corpus of arabic summaries.
- Ibrahim Abu El-Khair. 2016. 1.5 billion words Arabic Corpus. arXiv preprint arXiv:1611.04033.
- Orca: A challenging benchmark for arabic language understanding. arXiv preprint arXiv:2212.10758.
- Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3135–3151, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Arabic natural language processing: An overview. Journal of King Saud University-Computer and Information Sciences, 33(5):497–507.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- The interplay of variant, size, and task type in arabic pre-trained language models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, WANLP 2021, Kyiv, Ukraine (Virtual), April 9, 2021, pages 92–104. Association for Computational Linguistics.
- Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71.
- WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4034–4048, Online. Association for Computational Linguistics.
- A holistic assessment of the carbon footprint of noor, a very large arabic language model. In Proceedings of BigScience Episode\normal-\\backslash\# 5–Workshop on Challenges & Perspectives in Creating Large Language Models, pages 84–94.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Mixed precision training. In In International Conference on Learning Representations.
- Jasmine: Arabic gpt models for few-shot learning. arXiv preprint arXiv:2212.10755.
- AraT5: Text-to-text transformers for Arabic language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 628–647, Dublin, Ireland. Association for Computational Linguistics.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037.
- A survey on arabic named entity recognition: Past, recent advances, and future trends. arXiv preprint arXiv:2302.03512.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners.
- Language modelling at scale: Gopher, ethical considerations, and retrieval. DeepMind Blog.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
- Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
- Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2054–2059.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Alue: Arabic language understanding evaluation. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 173–184.
- Noam Shazeer and Mitchell Stern. 2018. Adafactor: Adaptive learning rates with sublinear memory cost.
- On the effect of pretraining corpora on in-context learning by a large-scale language model. arXiv preprint arXiv:2204.13509.
- Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053.
- Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv preprint arXiv:2201.11990.
- Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation.
- Teo Susnjak. 2022. Chatgpt: The end of online exam integrity? arXiv preprint arXiv:2212.09292.
- Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503.
- Unifying language learning paradigms. arXiv preprint arXiv:2205.05131.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.
- mt5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498.
- Pangu-α𝛼\alphaitalic_α: Large-scale autoregressive pretrained chinese language models with auto-parallel computation. arXiv preprint arXiv:2104.12369.
- Osian: Open source international arabic news corpus-preparation and integration into the clarin-infrastructure. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 175–182.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Designing effective sparse expert models. arXiv preprint arXiv:2202.08906.
- Asaad Alghamdi (2 papers)
- Xinyu Duan (15 papers)
- Wei Jiang (341 papers)
- Zhenhai Wang (14 papers)
- Yimeng Wu (8 papers)
- Qingrong Xia (13 papers)
- Zhefeng Wang (39 papers)
- Yi Zheng (165 papers)
- Mehdi Rezagholizadeh (78 papers)
- Baoxing Huai (28 papers)
- Peilun Cheng (1 paper)
- Abbas Ghaddar (18 papers)