TigerCoder: Code LLMs for Bangla

Updated 14 September 2025

TigerCoder-family of Code LLMs is a suite of specialized models for Bangla code generation that leverages targeted fine-tuning and high-quality datasets.
The models, including TigerCoder-1B and TigerCoder-9B, achieve notable 11–18% Pass@1 improvements by employing rigorous filtering and validation strategies.
Their open-source release, along with the MBPP-Bangla benchmark, fosters research in localized code generation and low-resource language model development.

The TigerCoder-family of Code LLMs constitutes the first dedicated suite of LLMs designed for code generation in Bangla, addressing a substantial gap in low-resource language support for program synthesis. These models operate across multiple programming languages given Bangla natural language prompts and are constructed with explicit domain adaptation through high-quality Bangla code instruction datasets. Their release introduces significant performance improvements—on the order of 11–18% at Pass@1—compared to general-purpose and multilingual Bangla LLMs, and all training resources are open-sourced to foster further research in localized, domain-specific LLMs (Raihan et al., 11 Sep 2025).

1. Architectural Foundations and Model Variants

The TigerCoder suite is based on fine-tuning a Bangla-capable base model (TigerLLM) using three complementary, purpose-built datasets for code generation. Two principal variants are developed: TigerCoder-1B with approximately 1 billion parameters and TigerCoder-9B with around 9 billion parameters. Rather than simply increasing model scale, TigerCoder exploits a targeted, domain-specific fine-tuning regimen, which enables the smaller model to compete against, and occasionally outperform, much larger general-purpose LLMs. Both models exhibit robust code generation across Python, Java, JavaScript, Ruby, and C++ in response to Bangla prompts, reflecting careful architectural and data engineering.

2. Data Collection, Curation, and Quality Assurance

The construction of high-fidelity training corpora is pivotal to TigerCoder's effectiveness:

Bangla-Code-Instruct-SI (Self-Instruct): Contains 100,000 instruction–code pairs, seeded by 5,000 manually authored Bangla prompts and expanded via GPT-4o generation. Only examples that pass both syntactic (Python’s ast.parse) and execution checks (sandboxed runs) are retained. High cosine similarity (>0.95) between prompts is used to eliminate redundancy.
Bangla-Code-Instruct-Syn (Synthetic): Another 100,000 pairs, generated by LLMs (GPT-4o, Claude 3.5-Sonnet) with Bangla prompts and filtered using BERTScore (≥0.7) to ensure diversity.
Bangla-Code-Instruct-TE (Translated): 100,000 English instruction–code pairs from Evol-Instruct are translated into Bangla using NLLB; code is unchanged. Quality estimation (CometKiwi QE > 0.85) and semantic fidelity (BERTScore F1 > 0.95) are applied for translation integrity.

Filtering includes syntax and runtime validation, redundancy control, and expert review for translations, ensuring that code-specific nuances are preserved and that instructions are both functionally and linguistically sound.

3. MBPP-Bangla: Benchmark Design and Evaluation Metrics

MBPP-Bangla extends the MBPP dataset for rigorous Bangla code generation evaluation. It comprises 974 programming problems, each annotated with a Bangla prompt (expert-corrected), five reference solutions across major programming languages, problem ID, topical label, and comprehensive test cases. Pass@K is the primary metric:

$\text{Pass@K} = 1 - \frac{\binom{n-m}{K}}{\binom{n}{K}}$

where $n$ is the number of generated programs, $m$ is the number of correct solutions.

This benchmark supports granular analysis of model performance by programming domain and language, enabling direct comparison across models and languages. TigerCoder models demonstrate consistent improvements, particularly eliminating the performance drop seen in models exposed to Bangla prompts.

4. Empirical Performance and Comparative Insights

TigerCoder-family models outperform previous systems—including Gemma-3, GPT-3.5, and the TigerLLM base—on MBPP-Bangla, with 11–18% higher Pass@1 scores. Notably, the 1B variant surpasses models with up to 27 times more parameters, while the 9B variant further improves results.

Critical contributors to this performance include:

Synergistic effect from combining SI, Syn, and TE training data
Careful fine-tuning regimen that privileges code generation capabilities across multiple programming languages given Bangla prompts
Rigorous example filtering for code correctness by parsing and execution

These results establish that high-quality targeted datasets combined with domain adaptation can compensate for model scaling limitations in low-resource settings.

5. Methodological Implications for Low-Resource Code Generation

The TigerCoder approach demonstrates a reproducible pathway for developing strong code LLMs in languages where digital resources are scarce. The mixed methodology—combining human-authored prompts, synthetic augmentation, and high-fidelity translation—can be generalized for other underrepresented languages. The findings stress that model scale alone does not guarantee performance; targeted data engineering and domain-centric fine-tuning drive meaningful improvements.

A plausible implication is that as localized, curated datasets become available, similar gains can be realized for other low-resource languages, reducing the digital divide in code LLM research and deployment.

6. Open-Source Publication and Community Impact

All datasets (Bangla-Code-Instruct-SI, Syn, TE), MBPP-Bangla, and TigerCoder model checkpoints (1B and 9B) are openly released. Documentation—including JSONLines records with prompt, solution, test cases, and labels—facilitates benchmarking and reproducibility. This transparency supports routine regression testing and makes TigerCoder a reference for Bangla LLM research, ensuring model and dataset accessibility for subsequent innovation and evaluation.

7. Technical Implementation and Future Prospects

Hyperparameter selection for fine-tuning (see Table 6 (Raihan et al., 11 Sep 2025)) includes a sequence length of 2048, AdamW optimizer, cosine learning rate scheduling, batch size and accumulation, and model random seeds. Code validation leverages sandboxed Python 3.13.0 for execution with strict runtime limits. The Pass@K formula and filtering strategies are directly drawn from the benchmark and dataset preparation procedures.

Future research is expected to explore:

Further cross-lingual adaptation or transfer learning using TigerCoder as a base
Expansion to additional programming languages and task types beyond simple code synthesis (e.g. debugging, translation, documentation)
Systematic analysis of curriculum learning approaches and their effects on low-resource code generation

The TigerCoder-family establishes an empirically grounded model suite and methodological framework for code generation in low-resource languages, providing both a technical and community-oriented foundation for next-generation code LLM development (Raihan et al., 11 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

TigerCoder: A Novel Suite of LLMs for Code Generation in Bangla (2025)

Follow Topic

Get notified by email when new papers are published related to TigerCoder-family of Code LLMs.