SRI-Coder Series: Single-Pass Code Infilling

Updated 26 January 2026

SRI-Coder Series is a family of large language models that use a search-and-replace infilling mechanism to generate explicit ‘search’ and ‘replace’ blocks in a single pass.
It leverages a curated 200K dataset from The Stack v2 with patch-style supervision, ensuring balanced, context-aware training for practical code editing.
The models deliver significant improvements in accuracy and efficiency with reduced inference latency and enhanced safety compared to traditional Fill-in-the-Middle approaches.

The SRI-Coder Series designates a family of LLMs for source code infilling and editing, based on the Search-and-Replace Infilling (SRI) framework. SRI-Coder models advance beyond the prevalent Fill-in-the-Middle (FIM) paradigm by operationalizing a single-pass, context-aware verification-and-editing process. This is enabled by a structured output of explicit “search” and “replace” blocks, harmonizing instruction-following capabilities of chat-oriented models with the efficiency and practicality required in automated code completion and assisted development tasks (Zhang et al., 19 Jan 2026).

1. Formalization of Search-and-Replace Infilling

Classic Fill-in-the-Middle (FIM) infilling generates a missing code segment $m$ by maximizing conditional likelihood given the prefix and suffix context:

$\hat m = \arg\max_{m} P(m \mid \text{prefix}, \text{suffix}) = \arg\max_{m} \prod_{t=1}^{T} P(w_t \mid \text{ctx}, w_{<t}).$

SRI recasts infilling as an integrated “search” and “replace” operation executed in a single generative pass. Given context $C = (\text{prefix}, \mathtt{/*\ MIDDLE\ CODE\ TO\ COMPLETE\ */}, \text{suffix})$ , the model emits a sequence of the form:

$y = [\texttt{<<<<<<< SEARCH},\,y^s,\,\texttt{=======},\,y^r,\,\texttt{>>>>>>> REPLACE}],$

where $y^s$ designates the lines to be edited (“search block”), and $y^r$ is the corrected or completed code (“replacement block”). The formal objective is:

$(\hat y^s, \hat y^r) = \arg\max_{(y^s, y^r)} \prod_{t=1}^{T_s} P(w_t^s \mid C, w_{<t}^s) \cdot \prod_{t'=1}^{T_r} P(w_{t'}^r \mid C, y^s, w_{<t'}^r),$

implicitly reducible to a single $\arg\max_y P(y\mid C)$ . This explicit diff-style formalism enables SRI-Coder models to identify and correct context-sensitive errors while maintaining the efficiency of single-pass inference (Zhang et al., 19 Jan 2026).

2. Dataset Construction and Patch-Style Supervision

The SRI-200K dataset underpins SRI-Coder pretraining. It is derived from "The Stack v2" via tree-sitter, extracting logical AST blocks such as function bodies, loops, single lines, and random spans. The curation ensures 200,000 examples with balanced distribution: function bodies (2 units), multi-line blocks (1), random spans (1), and single lines (1). Fine-tuning uses a 20,000-sample high-star-count subset, with the remainder employed for ablation studies.

Each sample consists of a code file featuring a single /* MIDDLE [CODE](https://www.emergentmind.com/topics/chaosode-code) TO COMPLETE */ marker. The label is a unified patch:

<<<<<<< SEARCH
// 10 lines of context including the marker
=======
 // same 10 lines but with ground-truth code
>>>>>>> REPLACE

This synthetic diff provides supervised guidance for grounding and correcting non-contiguous or multi-line code edits in a unified framework, directly aligning model behavior with practical developer workflows (Zhang et al., 19 Jan 2026).

3. Architectural Design and Training Procedure

SRI-Coder builds on Qwen2.5-Coder and Qwen3-Coder checkpoints, covering a wide parameter range: 0.5B, 1.5B, 3B, 7B, 14B, 32B, 30B, 235B, and 480B. No additional layers or adapters are introduced; the Transformer backbone with SwiGLU activations and RMSNorm remains unchanged. The tokenizer is inherited from Qwen-Coder.

Key training features include:

Data mixture: 20,000 SRI patches + 60,000 general code-assistant instructions (Glaive-Code-Assistant) + 100 safety alignments.
Instruction-tuning by minimizing standard cross-entropy loss:

$\mathcal{L}_{\rm CE} = -\sum_{t=1}^{T_s+T_r}\log P(y_t^* \mid C, y^*_{<t}),$

with $y^*$ as the ground-truth search-and-replace diff.

Optimization: Adam (weight decay 0.1, gradient clipping 1.0), BF16 precision, 16×A100 GPUs, initial LR $5 \times 10^{-5}$ with scheduled decay.
Prompt conventions: input marker /* MIDDLE CODE TO COMPLETE */, output delimiters <<<<<<< SEARCH, =======, and >>>>>>> REPLACE (Zhang et al., 19 Jan 2026).

4. Evaluation Methodology and Performance Analysis

SRI-Coder models are benchmarked against base models (standard FIM), natural-language FIM variants (standard, dialogue, template prompts), and a spectrum of proprietary/open LLMs (GPT-4, Claude, Gemini, DeepSeek, Qwen3, Grok):

Similarity-based benchmarks: Exact Match (EM) and Edit Similarity (ES) on CrossCodeEval, RepoEval, CrossCodeLongEval.
Execution-based: Pass@1 and unit-test success on ExecRepoBench and SAFIM.
Latency: average inference time.

Empirical findings:

SRI-Coder achieves up to +20 EM/ES over NL-FIM chat baselines.
SRI-Coder-32B surpasses its Qwen2.5-Coder-32B base by +46.3 EM (CrossCodeEval) and +49.3 ES (CrossCodeLongEval).
Single-pass SRI-Coder matches FIM latency to within a few percent, remaining significantly faster than agentic multi-step tools.
SRI-Coder retains general coding competencies, with no observed pass@1 drop on MBPP, HumanEval, BigCodeBench, or LiveCodeBench—a marked improvement over NL-FIM fine-tuning which causes a 5–10 point loss (Zhang et al., 19 Jan 2026).

5. Algorithmic Workflow of SRI Inference

The SRI paradigm consists of four principal steps executed in a single-shot generative manner, as shown in the following high-level pseudocode:

function SRI_Infill(file_contents):
    # 1. Locate marker and extract 10-line window W
    p, marker, s = slice_around_marker(file_contents, window=10)
    prompt = format_system_prompt() + p + marker + s 

    # 2. Model generates diff in one shot
    diff_output = Model.generate(prompt) 

    # 3. Parse SEARCH and REPLACE blocks
    search_block, replace_block = parse_diff(diff_output) 

    # 4. Optionally convert to standard git patch or apply directly
    patched_file = apply_search_replace(p, s, search_block, replace_block)
    return patched_file

This mechanism enables practical generation of structured diffs, facilitating downstream integration with development tools and version control ecosystems (Zhang et al., 19 Jan 2026).

6. Security Alignment, Limitations, and Prospective Directions

SRI-Coder inherits safety benefits from instruction-following chat models: the SAL benchmark attack success rate drops from approximately 100% (base, unaligned) to below 20% (Table 1), significantly improving over FIM models. Inference latency remains competitive, and SRI-Coder exhibits robust generalization on unseen coding tasks.

Limitations include current evaluation being offline and absence of real-world IDE/user studies, as well as modest gains on models ≤1.5B parameters. This suggests that effective transfer of the SRI paradigm to lightweight architectures may require knowledge distillation or curriculum learning. The SRI-Coder series is slated for open-source release and integration into code editors for iterative community-driven enhancement (Zhang et al., 19 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SRI-Coder Series.