TranslateGemma: Open-Source MT Suite

Updated 15 January 2026

TranslateGemma is an open-source suite of MT models derived from the Gemma family, offering robust multilingual translation capabilities across diverse language pairs.
It employs multi-stage training pipelines—including continual pretraining, supervised fine-tuning, and RLHF—to optimize performance and maintain technical transparency.
It features practical language-adaptation techniques such as the Branch-and-Merge protocol to fine-tune models for low-resource and fairness-sensitive translation tasks.

TranslateGemma is an open-source suite of machine translation (MT) models derived from the Gemma family of multilingual LLMs. It encompasses both general-purpose “multilingual translation” systems and language-adaptation pipelines, ranging from scalable open MT deployment to targeted fine-tuning for low-resource and specialty domains. TranslateGemma aims to expose robust translation capabilities across dozens of languages, provide technical transparency regarding its training and evaluation, and offer practical recipes for research-driven model adaptation to new languages or application-specific fairness constraints.

1. Architectural Basis and Model Variants

TranslateGemma directly builds upon successive generations of Gemma LLMs—primarily Gemma 2 and Gemma 3. These models are Transformer decoder-only architectures whose parameter size and layer count scale from 4B, 9B, 12B, to 27B. All variants employ a SentencePiece-based vocabulary—optimized for efficient subword coverage across 200+ scripts—which is preserved during fine-tuning and adaptation to maintain cross-lingual generalizability (Cui et al., 4 Feb 2025, Finkelstein et al., 13 Jan 2026).

Architectural modifications are minimal: Token embeddings are typically frozen throughout all stages of supervised fine-tuning and RLHF to prevent catastrophic forgetting, while other parameters remain fully trainable. GemmaX2-28, for instance, is a 9B-parameter decoder-only Transformer with no Mixture-of-Experts or sparse layers, explicitly matching open LLaMA-style configurations (Cui et al., 4 Feb 2025).

2. Training Pipelines and Data Recipes

TranslateGemma implements multi-stage training pipelines optimized for translation objectives:

A. Continual Pretraining

High-resource models (e.g., GemmaX2-28-9B) undergo continual pretraining over a mixture of monolingual corpora (e.g., CulturaX, MADLAD-400) and parallel corpora (OPUS, SMOL, GATITOS) spanning up to ~2B tokens per language (Cui et al., 4 Feb 2025).
The Parallel-First Monolingual-Second (PFMS) mixing strategy ensures maximal sampling from parallel data for each language, falling back to monolingual text only as necessary. The mixing weights are calculated as $r_\ell = \min\left(\frac{P_\ell^{\mathrm{par}}}{T}, 1\right), \quad m_\ell = \max(T - P_\ell^{\mathrm{par}}, 0)$ where $P_\ell^{\mathrm{par}}$ is available parallel data and $T$ is the target tokens per language (Cui et al., 4 Feb 2025).
Cross-entropy loss is minimized across concatenated token streams:

$\mathcal{L}_{\mathrm{PFMS}} = \alpha_\ell\,\mathcal{L}_{\mathrm{par}} + (1-\alpha_\ell)\,\mathcal{L}_{\mathrm{mono}}$

where $\alpha_\ell = r_\ell$ is the mixing ratio.

B. Two-Stage Translation Fine-Tuning

Supervised Fine-Tuning (SFT): Models are trained on curated high-quality synthetic and human-translated parallel datasets for all target language pairs. Optimization uses AdaFactor (lr = 1e–4, batch = 64, 200K steps) and standard token-level cross-entropy (Finkelstein et al., 13 Jan 2026).
Reinforcement Learning from Human and Model Feedback (RLHF): Translation quality is further optimized using an ensemble of reward models:
- MetricX-24-XXL-QE (DA+MQM-based regression)
- Gemma-AutoMQM-QE (token-level MQM severities and standard MQM weights)
- ChrF (character-F metric)
- Naturalness Autorater (LLM-based judge)
- Generalist reward (multilingual, multi-task post-training mixture)
- Per-token advantages combine sequence-level and fine-grained signals:

$A_t = \underbrace{\sum_{t'=t}^T r_{\mathrm{seq}}(t') - b}_{\text{PPO reward-to-go}} + \sum_{t'=t}^T r_{\mathrm{tok}}(t')$

RL optimization minimizes the clipped PPO-style surrogate policy loss with entropy regularization.

C. Branch-and-Merge Language Adaptation (for Low-Resource or New Languages)

TranslateGemma offers a continual pretraining protocol based on Branch-and-Merge, which alternates target-language specialization (on pure new-language shards) with weight merging to avoid English degradation. Merging is performed via SLERP (spherical linear interpolation) along the great-circle between parameter vectors (Alexandrov et al., 2024).
After several odd–even branch/merge cycles and skill injection (from a multilingual instruction-tuned Gemma-2), the resulting bilingual model is instruction-finetuned.

Model Variant	Training Recipe	Key Features
GemmaX2-28-9B	PFMS+SFT+RLHF	Multilingual, ∼10B params
TranslateGemma-4B	SFT+RLHF	Lightweight, frozen embeddings
BgGPT-Gemma-2-27B	Branch-and-Merge+SFT	Bulgarian adaptation
GemmAr-7B-V1	SFT (monolingual only)	Arabic, no translation

3. Evaluation Benchmarks and Translation Quality Metrics

TranslateGemma evaluation comprises automatic and human assessments across diverse test suites:

A. Automatic Metrics

MetricX (quality estimation; lower is better)
COMET22 (higher is better)
ChrF
spBLEU

Results from WMT24++ (55 language pairs) and FLORES-200 demonstrate that TranslateGemma consistently improves over base Gemma 3/2 and outperforms all open models <10B. GemmaX2-28-9B achieves near–Google Translate and GPT-4-turbo average performance across 28 languages (Cui et al., 4 Feb 2025):

Model	WMT-24 en→xx XCOMET	FLORES en→xx spBLEU/COMET
GemmaX2-28-9B	79.37 / 74.41	39.72 / 88.35
TowerInstruct-13B	86.10 / 76.74	40.60 / 88.89
Google Translate	77.64 / 73.00	41.52 / 88.51
GPT-4-turbo	79.35 / 75.40	37.41 / 87.61

B. Human Evaluation

WMT25 MQM (lower is better): TranslateGemma-27B outperforms or ties base Gemma 3 in seven out of ten directions, with occasional regressions on named-entity fidelity (Finkelstein et al., 13 Jan 2026).

C. Specialized Benchmarks

Vistra (image-based translation): TranslateGemma retains baseline multimodal capability, with MetricX and COMET gains over base (Finkelstein et al., 13 Jan 2026).
Gender-fair translation (GFG challenge): Protocols mandate classification-based and coverage-weighted accuracy. TranslateGemma systems equipped with fine-tuned gender-neutral classifiers and constraint decoding achieve competitive performance on GeNTE and Neo-GATE datasets (Frenda et al., 2024).

4. Practical Deployment, Usage Patterns, and Adaptation

TranslateGemma models are open-sourced, available on ModelSpace/HuggingFace. Usage involves conventional prompt-based translation, with the following minimal Python/HuggingFace example for GemmaX2-28-9B (Cui et al., 4 Feb 2025):

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ModelSpace/gemmax2-28-9B")
model = AutoModelForCausalLM.from_pretrained("ModelSpace/gemmax2-28-9B")

def translate(src_text, src_lang, tgt_lang, max_length=256):
    prompt = (f"Translate this from {src_lang} to {tgt_lang}:\n"
              f"{src_lang}: {src_text}\n"
              f"{tgt_lang}:")
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_length=max_length, eos_token_id=tokenizer.eos_token_id)
    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    return decoded.split(f"{tgt_lang}:")[-1].strip()

Example outputs reflect strong performance for both high- and mid-resource directions (“Bonjour, comment ça va ?” for English→French; “វាមានតម្លៃប៉ុន្មាន?” for English→Khmer).

Adaptation to new languages follows the Branch-and-Merge protocol, requiring large-scale bilingual web data and targeted instruction tuning. For fairness-oriented translation, as in the Italian gender-fair challenge, prompt engineering, decoding constraints, and fine-tuning on annotated references (e.g., Neo-GATE placeholders for neomorphemes) are recommended (Frenda et al., 2024).

5. Limitations, Safety, and Known Challenges

While TranslateGemma surpasses prior open-source LLMs for MT, several limitations remain:

Low-resource directions exhibit a performance gap to large-proprietary systems (e.g., en→km BLEU 39 vs. Google BLEU 65).
Instruction-tuning data sparsity may induce hallucinations or off-target translations.
Named-entity fidelity and adaptation for domain-specific registers need improvement, as indicated by regressions in ja→en and specialty domains in human MQM.
Gender-fair translation challenges persist in morphologically complex languages (e.g., Italian agreement, schwa neomorpheme insertion) (Frenda et al., 2024).
Safety filtering in general-purpose LLMs may block sensitive content in historical, legal, or victim narrative translation without clear distinction between “mention” and “use” (Tekgurler, 14 Mar 2025).

6. Impact, Research Applications, and Future Directions

TranslateGemma advances open machine translation by:

Enabling practical-scale, high-quality translation across >50 languages for text and image modalities, with efficiency–accuracy trade-offs that favor model deployment on limited hardware (Finkelstein et al., 13 Jan 2026).
Delivering a transparent adaptation procedure for low-resource languages and fairness-sensitive tasks, as demonstrated in Bulgarian (BgGPT, +3.72 average accuracy without English loss) (Alexandrov et al., 2024).
Supporting prompt engineering, reward shaping, and data mixing recipes validated across multi-domain and low-resource benchmarks (Finkelstein et al., 13 Jan 2026, Cui et al., 4 Feb 2025).

Ongoing research priorities include refining reward models via post-edit logs and interactive feedback, extending explicit multimodal training for joint vision–text tasks, and enhancing domain or fairness-specific translation pipelines. The modularity and open release of TranslateGemma’s source models and recipes position it as a cornerstone for multilingual, fair, and adaptive MT research.

Markdown Report Issue Upgrade to Chat

References (5)

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study (2025)

TranslateGemma Technical Report (2026)

BgGPT 1.0: Extending English-centric LLMs to other languages (2024)

GFG -- Gender-Fair Generation: A CALAMITA Challenge (2024)

LLMs for Translation: Historical, Low-Resourced Languages and Contemporary AI Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TranslateGemma.

TranslateGemma: Open-Source MT Suite

1. Architectural Basis and Model Variants

2. Training Pipelines and Data Recipes

3. Evaluation Benchmarks and Translation Quality Metrics

4. Practical Deployment, Usage Patterns, and Adaptation

5. Limitations, Safety, and Known Challenges

6. Impact, Research Applications, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

TranslateGemma: Open-Source MT Suite

1. Architectural Basis and Model Variants

2. Training Pipelines and Data Recipes

3. Evaluation Benchmarks and Translation Quality Metrics

4. Practical Deployment, Usage Patterns, and Adaptation

5. Limitations, Safety, and Known Challenges

6. Impact, Research Applications, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research