TigerCoder: Bangla Code LLM Suite

Updated 10 November 2025

TigerCoder is a dedicated Bangla code generation suite that employs transformer-based, decoder-only LLMs with 1B and 9B parameters.
It leverages a meticulously curated 300K Bangla instruction–code corpus with advanced filtering for syntactic and semantic quality.
Empirical evaluations on MBPP-Bangla show superior Pass@1 performance, outperforming larger multilingual baselines.

The TigerCoder family comprises the first dedicated suite of LLMs for code generation in Bangla, addressing a crucial underrepresentation of Bangla in code-centric language modeling. This suite consists of two transformer-based, decoder-only models with parameter counts of approximately 1 billion (1B) and 9 billion (9B), both derived from Bangla-specialized TigerLLM checkpoints. TigerCoder emphasizes adaptation to the programming domain through carefully curated instruction–code datasets and is evaluated with MBPP-Bangla, a benchmark specifically constructed for Bangla code generation. Empirical results demonstrate substantial performance gains over existing multilingual and general-purpose LLMs, showcasing the impact of targeted data curation in resource-constrained linguistic domains.

1. Model Architecture and Parameterization

Each TigerCoder model employs a pre-norm decoder-only Transformer architecture, parameterized as follows:

1B variant: 24 layers ( $L=24$ ), 1024 hidden dimensions ( $d=1024$ ), 16 attention heads ( $h=16$ ), 4096 feedforward inner dimension ( $f=4096$ ), yielding $\sim 1 \times 10^9$ parameters.
9B variant: 32 layers ( $L=32$ ), 4096 hidden dimensions ( $d=4096$ ), 32 attention heads ( $h=32$ ), 16384 feedforward inner dimension ( $f=16384$ ), yielding $\sim 9 \times 10^9$ parameters.

The total parameter count $P$ is estimated by:

$P \simeq 12 \cdot L \cdot d^2 + 4 \cdot L \cdot d \cdot f + \text{vocab\_size} \cdot d$

The leading term captures the self-attention and feed-forward network parameters, reflecting the scaling logic used in modern Transformer-based LLMs.

Both variants are finetuned via a standard maximum-likelihood cross-entropy objective:

$L_{CE} = -\sum_{t=1}^{T} y_t \log p_{\theta}(y_t | y_{<t})$

Optimization employs AdamW, with learning rates set to $1 \times 10^{-5}$ (1B) and $1 \times 10^{-6}$ (9B), weight decay of 0.02 and 0.04 respectively, and a cosine learning rate schedule with 10–15% warm-up steps over three epochs.

2. Instruction–Code Corpus Construction

TigerCoder's performance is grounded in a 300,000-example Bangla instruction–code corpus, equally partitioned into three distinct 100K subsets:

Self-Instruct (SI):
- Initiated from 5,000 Bangla prompts authored by experts, spanning algorithms, data structures, file I/O, string operations, mathematics, and basic OOP.
- The Self-Instruct pipeline, using GPT-4o, generated instruction–Python pairs, filtered through both syntactic analysis (via ast.parse) and runtime verification (Python 3.13 sandbox).
- Sentence-level embedding cosine similarity ( $>0.95$ ) triggered deduplication for diversity enforcement.
Synthetic (Syn):
- GPT-4o and Claude 3.5 were instructed in Bangla to produce novel instruction–code pairs.
- All synthetic code passed syntax checks; BERTScore thresholds (≥0.7) ensured minimal paraphrastic redundancy.
Translated (TE):
- 100,000 high-quality English instruction–code pairs from Evol-Instruct were machine-translated into Bangla using NLLB-200, maintaining the original Python code.
- Three machine translations were generated per prompt; selection criteria included CometKiwi QE ( $>0.85$ ) and BERTScore F1 ( $>0.95$ ).

Each subset underwent stringent filters for linguistic fidelity, semantic diversity, and code correctness, yielding a comprehensive resource that covers human, LLM-synthesized, and translation-derived code instructions.

3. Benchmarking: MBPP-Bangla and Evaluation Protocol

The MBPP-Bangla benchmark comprises 974 programming problems drawn from beginner to intermediate levels. Each problem was translated into Bangla by two independent TOEFL-certified native speakers, with adjudication by a polyglot expert, and mapped to canonical reference solutions across Python, Java, JavaScript, Ruby, and C++.

Key covered topics include:

String manipulation
Mathematical computations
Data structures
Algorithms
File I/O

Pass@K is used as the principal metric, defined as

$\text{Pass@}K = 1 - \frac{{n-m\choose K}}{{n\choose K}}$

where $n$ is the number of sampled generations, $m$ is the number of correct generations, and $K$ is the shortlist size ( $K \in \{1, 10, 100\}$ ). This metric captures probabilities for single-shot correctness ( $K=1$ ), shortlist-wide, and upper-bound performance.

4. Empirical Results and Comparative Performance

TigerCoder models were benchmarked against several multilingual open-source (LLaMA-3.2, Gemma 3, Phi-4, Pangea) and proprietary (GPT-3.5, GPT-4o-mini, Gemini-2.5) baselines on mHumanEval-Bangla and MBPP-Bangla.

Model	Params	mHumanEval-Bangla (Pass@1)	MBPP-Bangla (Pass@1)	Δ vs. Strongest Baseline
TigerCoder 1B	$1 \times 10^9$	0.69	0.74	+0.04 to +0.08
TigerCoder 9B	$9 \times 10^9$	0.75	0.82	+0.11 to +0.18

The 1B model, despite its modest size, outperforms baselines up to 27 times larger by 4–8 percentage points on Pass@1. The 9B variant achieves Pass@1 improvements in the range $Δ = 0.11 \text{–} 0.18$ compared to the strongest prior models (Gemma 3 27B, TigerLLM 9B). These relative improvements sustain across $K=10$ and $K=100$ .

5. Limitations and Known Constraints

The instruction corpus is predominantly Python-centric, potentially limiting cross-language generalization.
MBPP-Bangla, while covering five programming languages and topical areas, cannot comprehensively represent the full scope of real-world coding tasks.
TigerCoder is restricted to the 1B and 9B parameter regimes; no larger or multimodal models are currently available within this family.
Automated syntactic and semantic checks, despite their rigor, may not catch subtle semantic faults or domain-specific edge cases.
The curation process, while thorough, remains partially reliant on automated filtering and scoring mechanisms.

A plausible implication is that further improvements could be realized via multi-language corpus expansion, increased human-in-the-loop validation, and experimentation with larger or multimodal architectures.

6. Practical Applications and Open-Source Impact

TigerCoder models enable a range of practical use cases for Bangla-speaking educators, learners, and developers:

Localized coding assistants for Bangla-medium environments
Automated template code generation in educational settings
Bridging the digital literacy gap among Bangla-speaking software engineers

Datasets, benchmarks (MBPP-Bangla), and model weights are fully open-sourced under permissive licenses, supporting reproducibility, community scrutiny, and facilitating analogous efforts for other low-resource languages.

7. Research Contributions and Broader Significance

The TigerCoder initiative advances the state of LLM-based code generation in low-resource languages by:

Demonstrating that meticulously curated instruction–code datasets can compensate for smaller parameter counts, enabling 1B-scale models to surpass much larger multilingual LLMs in Bangla code generation.
Establishing robust benchmarks (MBPP-Bangla) and screening methodologies for linguistic fidelity, diversity, and code correctness.
Offering empirical evidence that targeted data curation represents a cost-effective strategy for high-performance language technology development in under-represented domains.

This supports the broader proposition that tailored pretraining and rigorous dataset design provide a promising path for elevating model competence in resource-scarce linguistic contexts.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to TigerCoder-Family of Code LLMs.