Gemma-2-2B Transformer Model

Updated 20 May 2026

Gemma-2-2B is a 2-billion-parameter transformer featuring 26 layers with interleaved local–global and group-query attention for efficient long-context modeling.
It employs full knowledge distillation from a larger teacher and a compositional in-context learning circuit, achieving competitive few-shot performance across benchmarks.
The model faces significant hallucination issues with symbolic triggers, prompting ongoing research in interpretability, safety, and robustness enhancements.

Gemma-2-2B is a 2-billion-parameter, open-weight transformer LLM developed by Google DeepMind as part of the Gemma 2 family, targeting efficient, high-quality text generation and understanding across a range of natural language tasks. The model leverages technical innovations in attention mechanism, is trained with knowledge distillation, and serves both as a core research baseline and a foundation for specialized derivatives in code generation, in-context learning, and interpretability research.

1. Architecture and Training Regime

Gemma-2-2B implements a decoder-only transformer backbone with 26 layers, model dimension $d=2304$ , inner feed-forward dimension $d_\mathrm{ff}=18\,432$ , 8 attention heads, each with 256 dimensions, and embedded rotary positional encodings. Two salient architectural optimizations distinguish Gemma-2-2B from previous small LLM baselines (Team et al., 2024):

Interleaved Local–Global Attention: Layers alternate between local sliding window attention ( $W=4096$ for $N=8192$ context) and global full-context attention, reducing self-attention complexity and memory usage while maintaining long-range modeling.
Group-Query Attention (GQA): 8 query heads share only 4 K/V projections, halving the cost of key–value computation in the attention sublayer.

Training departs from standard next-token prediction. Instead, Gemma-2-2B uses full knowledge distillation from a larger teacher (Gemma 7B), replacing cross-entropy over one-hot labels with a softened cross-entropy: $L_\mathrm{distill} = -\sum_{(x_c, x)} P_T(x|x_c)\log P_S(x|x_c)$ where $P_T$ and $P_S$ are teacher/student likelihoods (Team et al., 2024). Pretraining is performed on a 2T-token, English- and code-heavy web corpus, filtered for safety and diversity. No explicit curriculum beyond uniform shuffling is applied. Inference is conducted with low or zero temperature for determinism (Lamba et al., 9 Sep 2025).

2. Core Capabilities and Benchmarks

Gemma-2-2B demonstrates strong zero- and few-shot performance across academic benchmarks. In few-shot MMLU (5-shot) it achieves 52.2%; ARC-Challenge (25-shot), 55.7%; GSM8K (5-shot), 24.3%; and DROP (3-shot F1), 51.2%. Its pass@1 score on HumanEval (code) is 22.0% (Team et al., 2024). On average over several tasks, Gemma-2-2B is competitive with comparable models such as LLaMA-3 8B, and substantially outperforms its predecessor, Gemma-1 2B.

Its robustness to prompt phrasing is notable—standard deviation on MMLU format variants is 2.1%, lower than for Mistral 7B (6.9%) (Team et al., 2024).

For instruction-following, the instruction-tuned variant Gemma-2B-IT achieves a 45% win rate versus Mistral 7B v0.2 Instruct in “helpfulness” evaluations and a 60% win rate on safety prompts (Team et al., 2024).

3. Failure Modes: Symbolic Hallucination Characteristics

Despite broad competence, Gemma-2-2B exhibits persistent, high hallucination rates when presented with symbolic linguistic triggers—modifiers, named entities, numbers, negation, and exceptions. On HaluEval and TruthfulQA, averaged across QA, MCQ, and OOO formats, Gemma-2-2B produces factually incorrect (“hallucinated”) outputs 79.0% of the time (Lamba et al., 9 Sep 2025).

Hallucination rates, in QA format:

Symbolic Property	HaluEval (%)	TruthfulQA (%)
Modifiers	84.76	89.12
Named Entities	83.87	89.01
Numbers	83.16	96.00
Negation	70.00	91.67
Exceptions	100.00	94.44

While scaling to larger variants (Gemma-2-9B, 2-27B) reduces the average hallucination rate to 73.6% and 63.9%, respectively, the effect is modest for symbolic triggers. Attention patterns and activation responses in mid-to-deep transformer layers (e.g., layers 10/20 in 2B) are especially unstable when processing these elements (Lamba et al., 9 Sep 2025).

4. In-Context Learning and Internal Mechanisms

Gemma-2-2B leverages a compositional “contextualize-then-aggregate” in-context learning circuit motif (Bakalova et al., 31 Mar 2025). In the lower layers (1–8), attention heads contextualize representations of few-shot demonstrations through token-to-token information flow (e.g., input-to-input, output-to-output), while higher layers (9–20) aggregate these contextualized representations to yield the predicted output. Task disambiguation and classification accuracy are strongly dependent on the presence of such contextualization edges, especially for ambiguous or compositional input prompts.

Causal intervention studies demonstrate that ICL in Gemma-2-2B is neither purely parallel nor entirely sequential; rather, it emerges from a precisely orchestrated set of layer–head circuits assembling task and example information across the sequence (Bakalova et al., 31 Mar 2025).

5. Specialized Derivatives and Variants

Gemma-2-2B serves as a foundation for both code-specialized derivatives and architectural adaptations:

CodeGemma-2B is a code completion model with the same architectural base but trained on up to 1T code tokens, equipped with aggressive fill-in-the-middle formatting. It achieves 79.28% pass@1 on single-line code infilling and 37.8% pass@1 on HumanEval text-to-code—all at roughly twice the inference speed of comparable models such as DeepSeek Coder 2B (Team et al., 2024).
Encoder-Decoder Gemma-2B-2B, constructed by initializing both the encoder and decoder stacks from the decoder-only Gemma-2-2B weights, with cross-attention inserted, achieves a +7.4 absolute IT-score gain over the base decoder-only model at negligible latency overhead and boosts SuperGLUE finetuned accuracy from 75.5 to 88.3 (Zhang et al., 8 Apr 2025).
Gemma-2b-it is the instruction-tuned variant, providing the backbone for advanced prompt-recovery pipelines such as Gemma-2b-it + Phi2, which outperforms even larger models (Mistral 7B) on token-level prompt-reconstruction semantic similarity (Chen et al., 2024).

Gemma-2-2B is also the subject of deep interpretability research, with suites such as Gemma Scope training sparse autoencoders (JumpReLU SAEs) on internal activations from every layer and site in the model to probe the representational structure, feature splitting, and cross-layer compositionality (Lieberum et al., 2024).

6. Interpretability, Calibration, and Responsible Release

Gemma-2-2B includes design features for improved safety and interpretability. All normalization is via RMSNorm, which improves training stability. Training and release practices employ multi-layer filtering, safety benchmarks (toxicity, bias, factuality), and red-teaming. Model weights, tokenizers, and interpretability tools (such as SAE weights and tracing hooks) are fully open-source, enabling broad empirical and mechanistic research (Team et al., 2024, Lieberum et al., 2024).

Sparse autoencoder research on Gemma-2-2B reveals that interpretable, high-sparsity decompositions of residual, MLP, and attention activations can be constructed at scale, and these features are moderately robust to finetuning or quantization (Lieberum et al., 2024).

7. Current Limitations and Future Directions

Gemma-2-2B’s hallucination vulnerability to symbolic triggers is a fundamental representational weakness that persists with scaling and architectural tweaks. Intervention-based analysis indicates local representational fragility in mid-length prompts (10–30 tokens). Addressing this will require mechanistic interpretability, property-focused cross-model comparisons (e.g., with LLaMA, Mistral, GPT models), multilingual and multimodal extensions, and prompt-level ambiguity reduction (Lamba et al., 9 Sep 2025).

Ongoing work includes improving the compositionality and robustness of symbolic processing, designing more expressive cross-layer mechanisms, and using released interpretability artifacts (e.g. Gemma Scope SAEs) to discover and potentially steer or regularize problematic circuits. Additionally, the emergence of encoder-decoder adapted versions offers a path to higher finetuning performance and more favorable compute–quality trade-offs, particularly in input-heavy applications (Zhang et al., 8 Apr 2025).

Principal References:

(Team et al., 2024) Gemma 2: Improving Open LLMs at a Practical Size
(Lamba et al., 9 Sep 2025) Investigating Symbolic Triggers of Hallucination in Gemma Models Across HaluEval and TruthfulQA
(Bakalova et al., 31 Mar 2025) Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
(Team et al., 2024) CodeGemma: Open Code Models Based on Gemma
(Zhang et al., 8 Apr 2025) Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation
(Chen et al., 2024) Advancing Prompt Recovery in NLP: A Deep Dive into the Integration of Gemma-2b-it and Phi2 Models
(Lieberum et al., 2024) Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
(Team et al., 2024) Gemma: Open Models Based on Gemini Research and Technology