Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ALLaM: Large Language Models for Arabic and English (2407.15390v1)

Published 22 Jul 2024 in cs.CL and cs.AI

Abstract: We present ALLaM: Arabic LLM, a series of LLMs to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is carefully trained considering the values of language alignment and knowledge transfer at scale. Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining on a mixture of Arabic and English text can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English). Furthermore, we highlight the effectiveness of using parallel/translated data to aid the process of knowledge alignment between languages. Finally, we show that extensive alignment with human preferences can significantly enhance the performance of a LLM compared to models of a larger scale with lower quality alignment. ALLaM achieves state-of-the-art performance in various Arabic benchmarks, including MMLU Arabic, ACVA, and Arabic Exams. Our aligned models improve both in Arabic and English from their base aligned models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (25)
  1. M Saiful Bari (22 papers)
  2. Yazeed Alnumay (7 papers)
  3. Norah A. Alzahrani (1 paper)
  4. Nouf M. Alotaibi (1 paper)
  5. Hisham A. Alyahya (4 papers)
  6. Sultan Alrashed (4 papers)
  7. Faisal A. Mirza (1 paper)
  8. Shaykhah Z. Alsubaie (1 paper)
  9. Hassan A. Alahmed (1 paper)
  10. Ghadah Alabduljabbar (2 papers)
  11. Raghad Alkhathran (1 paper)
  12. Yousef Almushayqih (1 paper)
  13. Raneem Alnajim (2 papers)
  14. Salman Alsubaihi (4 papers)
  15. Maryam Al Mansour (1 paper)
  16. Majed Alrubaian (2 papers)
  17. Ali Alammari (1 paper)
  18. Zaki Alawami (1 paper)
  19. Abdulmohsen Al-Thubaity (1 paper)
  20. Ahmed Abdelali (21 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com