CodeGemma 2B: Efficient Code Generation Model

Updated 17 March 2026

CodeGemma 2B is a transformer-based large language model with 2 billion parameters, optimized for code generation and infilling using fill-in-the-middle formatting.
It utilizes a causal decoder-only architecture with extensive pre-training on up to 1 trillion code tokens and employs LoRA fine-tuning for rapid domain adaptation.
Benchmark results highlight its competitive performance in code completion, variable name recovery, and docstring generation, making it ideal for latency-sensitive deployments.

CodeGemma 2B is a 2-billion-parameter, transformer-based LLM specialized for code generation and infilling tasks, developed as part of the CodeGemma model family originating from Google’s Gemma architecture. The model is designed to deliver fast, high-quality code completions and open-ended code generation, with particular emphasis on efficiency and adaptability in latency-sensitive scenarios. Its architecture, training regimen, and adaptation strategies enable performance competitive with much larger models, while retaining computational tractability suitable for on-device or resource-constrained deployments (Team et al., 2024).

1. Model Architecture and Core Design

CodeGemma 2B adopts a causal, decoder-only transformer architecture directly inherited from the Gemma 2B backbone, with no structural modifications such as novel layer types or activation functions. Architectural parameters in published variants (Xu et al., 2023, Team et al., 2024, Poudel et al., 2024) include:

Total parameters: ≈2–2.1 billion
Layer count: 18 (DocuMint), 24 (GenNm/Stripped-Binary), matching Gemma 2B for code-centric variants
Hidden size: 2048 (GenNm/Stripped-Binary) or as in Gemma 2B (CodeGemma/DocuMint)
Feed-forward inner size: 4×hidden (8192, GenNm)
Number of attention heads: 8 (DocuMint), 32 (GenNm)
Context window: 4,096 tokens (GenNm), up to 8,192 tokens (DocuMint)
Vocabulary size: up to 256,000 tokens

CodeGemma 2B incorporates standard self-attention sublayers and position-wise feed-forward networks using GELU activations, with positional embeddings to cover long-range dependencies in code. Fill-in-the-middle (FIM) formatting with dedicated control tokens allows for prefix–suffix–middle and suffix–prefix–middle code infilling tasks (Team et al., 2024). The model’s design targets compatibility with standard quantization schemes for 4-bit and 8-bit inference.

2. Pre-training Data and Objectives

Pre-training uses large-scale code corpora extracted from public repositories (e.g., GitHub, StackOverflow, documentation), with stringent deduplication and filtering to prevent evaluation leakage and remove sensitive content. The principal objective is next-token prediction via causal language modeling (CLM). Training details for primary variants include (Team et al., 2024):

v1.0: 500 billion code tokens, 80% FIM formatting
v1.1: 1 trillion code tokens, 90% FIM formatting

Fill-in-the-middle pre-training requires the model to maximize the log-likelihood of “middle” tokens given provided prefix and suffix context. The loss is formally:

$\mathcal{L}_{\text{FIM}} = -\sum_{(p, m, s)} \sum_{t=1}^{|m|} \log p_\theta(m_t \mid p, \langle \text{fim} \rangle, s, m_{<t})$

There is no pre-instruction-tuning or RLHF in baseline CodeGemma 2B. The model is robustly equipped for open infilling through specialized pre-training regimes.

3. Fine-tuning Strategies and Adaptations

Fine-tuning methodology depends on downstream adaptation needs. Two prominent use cases have been explored:

a. Stripped-Binary Variable Name Recovery

In GenNm (Xu et al., 2023), CodeGemma-2B is fine-tuned on pairs of decompiled code bodies with associated variable-ID lists and their ground-truth renamings. The pipeline employs:

Dual branch models: a generative (GEN) branch trained purely with CLM loss on augmented context, and a classification (CLS) branch that adds a linear vocabulary head for the most frequent variable names, optimizing a joint loss:

$L_\text{total}(U) = L_\text{CE}(U) + L_\text{CLM}(U)$

where $L_\text{CE}(U)$ is a cross-entropy loss over new variable name positions.

Symbol preference alignment: regularization with a KL-divergence penalty to bias predictions towards empirical symbol name distributions observed in developer code:

$L_\text{gen}(U) = L_\text{CLM}(U) + \lambda \cdot D_\text{KL}(P_\text{model}(\cdot \mid x)\,||\,P_\text{symbol}(\cdot))$

Call-graph context augmentation: prepended pseudo-blocks list names from callers/callees, maximizing semantic coverage within the model’s token window.

b. Parameter-Efficient Domain Adaptation

In the docstring generation task (Poudel et al., 2024), CodeGemma 2B is fine-tuned on the DocuMint dataset using Low-Rank Adaptation (LoRA):

LoRA rank=64, alpha=128, dropout=0.1; only LoRA parameters (≈78.4M) are updated, leaving backbone weights frozen
Inputs are pairs of Python function signatures and their human-written docstrings, truncated/padded to 128 tokens

This establishes effectiveness for rapid domain adaptation with minimal compute and memory overhead.

4. Benchmark Performance and Comparative Analysis

a. Code Completion and Infilling

On HumanEval Infilling and MBPP (Team et al., 2024):

Model	HumanEval (pass@1)	MBPP (pass@1)
CodeGemma 2B-PT	31.1%	43.6%
CodeGemma 2B-PT 1.1	37.8%	49.2%
Gemma 2B PT	22.0%	29.2%

Inference on 1,033 single-line tasks yields 78–79% accuracy (pass@1) in under 0.6 s (single-line), with multi-line infilling at ≈51%.

Compared to larger models (e.g., DeepSeek Coder, StarCoder2), CodeGemma 2B matches or exceeds speed while delivering close or superior quality, especially in latency-sensitive situations.

b. Variable Name Recovery in Decompilation

On stripped-binary benchmarks (Xu et al., 2023):

Not-in-train (exact match): 49.0% (vs. 43.1% for CodeLlama-7B, 5.9 pp gain)
Rare name (train freq <10): jumps 17.3% → 22.8% (5.5 pp gain)
Token-precision gain: +11.4 pp over CodeLlama-7B (no name-specialization)
Even CodeLlama-34B (5× size) trails by 5–8 pp in both token-precision and recall

These results demonstrate distinct gains from context augmentation and symbol distribution alignment.

c. Natural Language Documentation

For code-to-docstring generation (DocuMint) (Poudel et al., 2024):

Model	Accuracy	Conciseness	Clarity
CodeGemma 2B (base)	0.516	0.425	91.69
CodeGemma 2B (fine-tuned)	0.582	0.521	58.75

Fine-tuning yields +12.7% in accuracy and +22.5% in conciseness, with clarity (Flesch–Kincaid reading ease) improving from a verbose mean of 91.69 to a more readable 58.75. It achieves competitive results even relative to Llama 3 8B.

5. Algorithmic Pipelines and Use Cases

A representative inference workflow for variable name recovery in binaries is described in GenNm (Xu et al., 2023). The pipeline iterates over:

Initial candidate name generation by the GEN branch per variable.
Context propagation using call-graph to collect neighbor function variable names.
Sampling additional candidates through both GEN and CLS sibling models with updated contexts.
Data-flow consistent name validation via cosine-similarity of embedding representations from the token embedder.

This approach delivers robust, semantically coherent renaming suitable for software security, vulnerability detection, and program understanding.

For general code infilling, CodeGemma 2B supports fill-in-the-middle on arbitrary code spans, with prompt formats controlling the infilled region by explicit FIM sentinel tokens (Team et al., 2024).

6. Deployment Considerations and Limitations

The 2B-parameter size balances memory and inference speed for resource-limited hardware. Typical single-line fills complete in ∼0.5 s on a GCE g2-standard-4 (bfloat16) instance, and further acceleration is possible using model quantization or batching. The model is well-suited for:

On-device IDE integration
Low-latency server-side code assistance
Multilingual code generation across many programming languages (e.g., C-family, Java, Python, Rust)

Known limitations include occasional ambiguous or unfinished code completions (if suffix context is ambiguous), potential for style drift in documentation tasks, and lack of dedicated RLHF or preference-based fine-tuning in the core release. Qualitative analysis (Poudel et al., 2024) notes typical SLM docstring generation issues: inconsistent style, incorrect references, and verbosity, indicating open areas for further research in human-aligned fine-tuning.

7. Research Impact and Ongoing Directions

CodeGemma 2B’s release, as an open-source variant of the Gemma family specialized for code, has enabled new research in program understanding, decompiler analysis, and natural language documentation via code-centric LLMs. Its parameter-efficient fine-tuning and symbol preference-aware adaptation strategies set benchmarks for both architectural scaling and task-specific specialization in small LLMs (Xu et al., 2023, Team et al., 2024, Poudel et al., 2024). The model’s design and deployment are likely influencing the ongoing evolution of hardware-software co-design for code LLMs and the development of highly efficient, application-specific inference pipelines. A plausible implication is continued specialization towards domain- and context-aware modeling via architecture-agnostic methods leveraging Transformer fundamentals.

Markdown Report Issue Upgrade to Chat

References (3)

CodeGemma: Open Code Models Based on Gemma (2024)

Symbol Preference Aware Generative Models for Recovering Variable Names from Stripped Binary (2023)

DocuMint: Docstring Generation for Python using Small Language Models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CodeGemma 2B.