Papers
Topics
Authors
Recent
Search
2000 character limit reached

Qalb Model: Advanced Urdu NLP

Updated 23 February 2026
  • Qalb Model is a state-of-the-art Urdu large language model addressing the underrepresentation of Urdu in NLP with focused pre-training and fine-tuning.
  • It employs a two-stage training pipeline—continued pre-training on a mixed Urdu-English corpus followed by supervised fine-tuning using LoRA for parameter efficiency.
  • The model outperforms multilingual counterparts in tasks like generation, translation, and sentiment analysis by robustly handling Urdu’s complex morphology and Nastaliq script.

Qalb Model

Qalb is a state-of-the-art Urdu LLM explicitly designed to address the chronic underrepresentation of Urdu in contemporary NLP systems, despite its use by over 230 million speakers. Existing multilingual models, such as LLaMA-3.1 8B-Instruct, demonstrate poor performance on Urdu-specific tasks due to challenges in handling Urdu's complex inflectional morphology, right-to-left Nastaliq script, and its rich literary and domain-specific registers. Qalb combines systematic, large-scale Urdu-focused continued pre-training with targeted instruction fine-tuning, achieving a new state-of-the-art across core Urdu NLP benchmarks (Hassan et al., 13 Jan 2026).

1. Model Architecture and Training Pipeline

Qalb follows a two-stage adaptation pipeline based on LLaMA-3.1 8B, an open-source LLM:

  • Stage 1: Continued Pre-training is conducted on a mixed Urdu–English corpus. This stage endows the model with deep knowledge of Urdu morphology, script, and various registers, while retaining foundational English capabilities via inclusion of English data as a replay buffer.
  • Stage 2: Supervised Fine-Tuning transforms the continued pre-trained model into an Urdu instruction-following assistant using the Alif Urdu-instruct dataset.

Parameter-efficient fine-tuning is achieved via Low-Rank Adaptation (LoRA), allowing adaptation of ∼1.18B parameters (14.7% of the base) using a single NVIDIA A100 80 GB GPU.

Model Training Details

Stage Corpus / Dataset Method Main Hyperparameters
Continued Pre-training 1.97B tokens (Urdu+English) LoRA (rank 128) LR=2×10⁻⁵ (emb: 2×10⁻⁶), bfloat16, batch=128, 7,500 steps
Supervised FT Alif Urdu-instruct LoRA (rank 128) LR=5×10⁻⁵, 2 epochs, bfloat16, batch=64

The above configuration leverages the general reasoning/generation capabilities and parameter-efficient adaptation of the underlying LLaMA backbone (Hassan et al., 13 Jan 2026).

2. Pre-training Corpus Curation and Statistics

The pre-training corpus is constructed to maximize Urdu language coverage across formality, genre, and domain:

  • Urdu Text (1.84B tokens):
    • News archives: BBC Urdu, Jang, Dunya News, UrduPoint (~61M words)
    • Literary corpora: Rekhta, Makhzan, Islamic books
    • Specialized domains: sports, entertainment, health
    • Colloquial: government documents, social media
  • English Text (140M tokens): Wikipedia, used to prevent catastrophic forgetting of English during adaptation.

The processed corpus resulted in 5.04M documents (~9.09GB) after multi-stage cleaning (removal of boilerplate, short texts, duplicates, junk). Urdu word-purity is 95.31%, indicating minimal cross-language contamination.

3. Parameter-Efficient Training Strategy

LoRA-adapted pre-training and fine-tuning inject rank-128 adapters in all linear and embedding layers:

  • Continued Pre-training: AdamW (8-bit), cosine decay (5% warmup), sequence length 2,048, bfloat16, effective batch size 128, 7,500 steps.
    • Loss descended from 1.07 to 0.77; perplexity 2.35 to 2.20, demonstrating learning on Urdu data.
  • Supervised Fine-Tuning: AdamW-8bit (0.01 weight decay), linear schedule (10-step warmup), batch size 64, epochs=2, bfloat16. LLaMA-3 chat format prompt with loss masking on user turns is used for instruction adaptation.

These choices reflect a balance between adaptation scale and hardware efficiency, suitable for accessible single-GPU setups.

4. Urdu NLP Benchmark Evaluation

Qalb is tested across seven Urdu-centric tasks using the Alif evaluation methodology, where GPT-4o automatically scores outputs against references in relevance, correctness, clarity, and formatting (0–10 scale). Human validation on a subset confirmed >85% judgment agreement.

Task Qalb Score (out of 100)
Generation 85.97
Translation 94.41
Ethics 90.83
Reasoning 88.59
Classification 96.38
Sentiment 95.79
QA 80.40

Weighted average: 90.34—exceeding Alif-1.0-Instruct (87.1) by 3.24 and LLaMA-3.1 8B-Instruct (45.7) by 44.64 points. The overall score is calculated as: Scoreweighted=t{Gen, Trans, E, Reas, Clf, Sent, QA}wtSt\mathrm{Score}_{\mathrm{weighted}} = \sum_{t \in \{\text{Gen, Trans, E, Reas, Clf, Sent, QA}\}} w_t S_t (where wtw_t are evaluation weights with wt=1\sum w_t=1).

5. Morphological and Script Robustness

Qalb demonstrates substantially improved capabilities for Urdu-specific phenomena:

  • Morphological Coverage: Effective acquisition of case endings, compound-verb structures, and inflectional patterns prevalent in Urdu.
  • Script Normalization: Robust handling of right-to-left Nastaliq script and avoidance of spurious Latin signatures found in outputs of previous baseline models.
  • Versatility: Superior handling of colloquial expressions, formal documents, and literary text, surpassing generic multilingual LLMs in fluency and fidelity.

Qualitative analysis indicates that, though Alif-1.0-Instruct slightly outperforms Qalb on raw Generation metrics, Qalb generates more concise, directly relevant, and instruction-adherent text. Manual side-by-side comparisons reveal reduced repetition, clearer logical structure, and improved alignment with user prompts.

6. Design Choices and Comparative Analysis

Qalb's outperforming prior models is attributed to:

  • Systematic Corpus Engineering: Balanced domain and register inclusion and high cross-lingual word-purity.
  • Replay Buffer to Prevent Catastrophic Forgetting: The inclusion of English Wikipedia tokens ensures the model retains English capacity, not regressing in non-Urdu capabilities.
  • Parameter-efficient Fine-tuning: The LoRA approach allows large-scale adaptation on sub-100GB hardware.

Unlike previous methods, Qalb successfully addresses the full range of Urdu language generation and understanding challenges, providing a principled, scalable blueprint for adapting foundation models to other low-resource languages.

7. Conclusion and Implications

Qalb establishes a new standard in Urdu NLP by combining large-scale, Urdu-focused continued pre-training with instruction fine-tuning in a parameter-efficient manner. Its strong benchmark performance (weighted average 90.34), robust handling of morphology and script, and qualitative alignment with user intent demonstrate that systematic adaptation of foundation models is both practical and effective for low-resource languages. This suggests broader applicability to other linguistically complex and underrepresented languages through strategic corpus curation and efficient adaptation methods (Hassan et al., 13 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Qalb Model.