Hala Models: Arabic-Centric Instruction LMs

Updated 18 September 2025

Hala Models are a suite of Arabic-centric language models that use a translate-and-tune pipeline, FP8 compression, and slerp merging to boost instruction-following performance.
They span model sizes from 350M to 9B parameters and achieve state-of-the-art results on diverse Arabic benchmarks with significant efficiency gains.
Released resources include model checkpoints, a curated million-scale bilingual dataset, and training scripts to ensure reproducibility and advance research in underrepresented languages.

Hala models are a family of Arabic-centric instruction-following and translation LLMs designed for high performance on Arabic natural language processing tasks. Developed using a translate-and-tune pipeline, these models are optimized for Arabic by leveraging high-quality bilingual supervision, model compression strategies, and parameter merging techniques. The Hala suite spans parameter scales from 350 million to 9 billion, achieving state-of-the-art results on a range of Arabic language benchmarks and providing extensive resources—models, data, scripts—to support reproducibility and further research (Hammoud et al., 17 Sep 2025).

1. Translate-and-Tune Pipeline and Data Generation

The Hala framework is centered on a translate-and-tune pipeline. Initially, a high-capacity AR⟷EN (Arabic–English) teacher translation model is compressed to the FP8 format using the LLM Compressor with per-tensor dynamic scaling and post-conversion validation. This improves inference throughput by approximately a factor of two, preserving translation quality. The compressed FP8 teacher generates large-scale, high-fidelity bilingual datasets by translating English instruction data—emphasizing complex, multi-step instructions—into Arabic.

A lightweight LLM, LiquidAI/LFM2-1.2B, is fine-tuned on this bilingual corpus. This fine-tuned model translates additional high-quality English instruction sets, sourced from data such as Open-Orca and OPUS-100, into Arabic. This process produces a curated, million-scale instruction-following corpus tailored specifically to the characteristics of Arabic language understanding and generation.

2. Model Variants, Bases, and Slerp Merging

Hala models are released in four sizes: 350M, 700M, 1.2B, and 9B parameters. The 350M, 700M, and 1.2B models are based on the LiquidAI/LFM2 checkpoint family, while the 9B model is built on the FANAR architecture. To combine the specialized capabilities imparted by Arabic-centric training with the generalization strengths of the base models, Hala employs spherical linear interpolation (slerp) merging. The fine-tuned instruction-following checkpoint and the original base model are interpolated with a parameter $t=0.5$ :

$\mathrm{slerp}(a, b; t) = \frac{\sin((1-t)\theta)}{\sin(\theta)}a + \frac{\sin(t\theta)}{\sin(\theta)}b$

This merging preserves general model capabilities while boosting Arabic instruction-following, providing a balanced solution for both generic and Arabic-specific tasks.

3. Compression to FP8 for Efficient Inference

A distinctive feature of the Hala pipeline is compressing the AR⟷EN teacher to FP8. The process uses per-tensor dynamic scaling and validation to guarantee numerical stability and maintain translation quality. The FP8 quantization nearly doubles inference throughput relative to FP16 without perceptible loss in accuracy or fluency, enabling scalable, efficient generation of the large amounts of bilingual data required for instruction tuning at this scale. The procedure is validated via translation quality checks post-compression, confirming no quality degradation.

4. Evaluation Protocols and Benchmark Performance

Hala models are evaluated using the LightEval toolkit and vLLM system across a suite of Arabic-centric benchmarks, including AlGhafa, ArabicMMLU, EXAMS, MadinahQA, AraTrust, and ArbMMLU-HT. In the “nano” model regime (≤2B parameters), Hala-1.2B delivers an average improvement of approximately +5.1 percentage points over the LiquidAI/LFM2-1.2B base. For the “small” regime (7–9B parameters), Hala-9B outperforms previous state-of-the-art models, such as QCRI/Fanar-1-9B-Instruct, across individual tasks as well as in aggregate. These results demonstrate that the combination of translate-and-tune, FP8 compression, and slerp merging yields robust improvements specifically tailored to Arabic benchmarks.

5. Released Resources and Reproducibility

The Hala technical report publicly releases:

Model checkpoints at all four scales (350M, 700M, 1.2B, 9B).
A curated, million-scale high-fidelity Arabic instruction dataset created via translation and filtering.
Training and evaluation scripts, including exact LightEval command lines and task definitions, ensuring full experimental reproducibility.
Recipes and documentation that detail the translate-and-tune pipeline and all required procedures.

This resource suite is designed to accelerate research in Arabic NLP and lower the barrier for LLM development by providing models, data, and procedural transparency.

6. Implications for Arabic NLP and Underrepresented Languages

By constructing a pipeline capable of scaling high-quality instruction synthesis to the million-example regime—in tandem with efficient model architectures and merging techniques—the Hala models set a new standard for Arabic-centric LLMs. The provision of data, code, and training recipes enables both incremental model improvement and adaptation of the translate-and-tune methodology to other underrepresented languages. This suggests that the core strategies behind Hala—bilingual supervision at scale, efficient model compression, and controlled merging—could serve as a template for advancing NLP in low-resource settings.

Markdown Report Issue Upgrade to Chat

References (1)

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hala Models.

Hala Models: Arabic-Centric Instruction LMs

1. Translate-and-Tune Pipeline and Data Generation

2. Model Variants, Bases, and Slerp Merging

3. Compression to FP8 for Efficient Inference

4. Evaluation Protocols and Benchmark Performance

5. Released Resources and Reproducibility

6. Implications for Arabic NLP and Underrepresented Languages

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Hala Models: Arabic-Centric Instruction LMs

1. Translate-and-Tune Pipeline and Data Generation

2. Model Variants, Bases, and Slerp Merging

3. Compression to FP8 for Efficient Inference

4. Evaluation Protocols and Benchmark Performance

5. Released Resources and Reproducibility

6. Implications for Arabic NLP and Underrepresented Languages

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research