Search-and-Replace Infilling (SRI)

Updated 26 January 2026

Search-and-Replace Infilling (SRI) is a unified framework that generalizes fill-in-the-middle to dynamic, context-aware code editing via one-pass inference.
It reformulates code completion as a diff-style search-and-replace operation, explicitly grounding edits through a two-block (SEARCH and REPLACE) generation process.
Leveraging instruction-tuned Transformer models and the SRI-200K dataset, the approach enhances performance and robustness while preserving overall coding competencies.

Search-and-Replace Infilling (SRI) is a code infilling framework that generalizes the traditional fill-in-the-middle (FIM) paradigm to support dynamic, context-aware editing through a single-pass inference. Unlike FIM, which is restricted to static completion, SRI structurally integrates verification and editing—the hallmarks of agentic workflows—directly into model generation. This enables explicit grounding of edits and aligns with instruction-following priors of contemporary Chat LLMs, leading to enhanced performance, robustness, and inference efficiency in code completion and editing tasks (Zhang et al., 19 Jan 2026).

1. Formal Framework of SRI

SRI reformulates the code completion problem as a search-and-replace operation over a code context $C = (P, \text{marker}, S)$ , where $P$ denotes the prefix, marker is a sentinel such as “/* MIDDLE CODE TO COMPLETE */”, and $S$ denotes the suffix. During inference, the model outputs a pair $Y = (z, r)$ :

$z$ : SEARCH block, a verbatim copy (“echo”) of the region containing the marker, grounding the edit location.
$r$ : REPLACE block, the substitution for the marker, representing the desired code completion or correction.

The model optimizes: $\hat Y = \arg\max_Y P(Y \mid C)$ A deterministic patch operator $\mathrm{ApplyReplace}(C, z, r)$ applies the diff-style update—functionally analogous to a git patch—by replacing $z$ with $r$ in $C$ .

2. Training Objective and Model Architecture

SRI models are instruction-tuned using token-level cross-entropy over the concatenated SEARCH and REPLACE sequence, with no auxiliary loss or regularization: $L = -\sum_{t=1}^T \log P(y_t \mid C, y_{<t})$ where $y_{1:T}$ is the full target token sequence. Architecturally, SRI-Coder variants inherit exactly the same Transformer stack (SwiGLU activations, RMSNorm) and byte-pair encoding as their respective base models (Qwen2.5-Coder and Qwen3-Coder, spanning 0.5B–480B parameters). Only the marker and diff delimiters (/* MIDDLE CODE TO COMPLETE */, <<<<<<< SEARCH, =======, >>>>>>> REPLACE) are introduced as new tokens.

3. Dataset Construction: SRI-200K

The SRI-200K dataset was constructed from The Stack v2 using Tree-sitter parsing, yielding 200K “middle” code segments balanced across four types: function bodies, multi-line blocks (if/for), random spans, and single lines (ratio 2:1:1:1). A 20K subset, weighted toward high-quality repositories by GitHub stars, was used for instruction tuning; the remainder is reserved for extended research. Each sample contains a full file context (truncated to 32K tokens), a 10-line edit window around the marker, and is rendered as a diff-style block for supervision:

<<<<<<< SEARCH
… code with /* MIDDLE CODE TO COMPLETE */ …
=======
… same code but marker replaced with ground-truth …
>>>>>>> REPLACE

This structure grounds the editing process and ensures explicit model alignment with the targeted code region.

4. Training Pipeline and Hyperparameters

SRI-Coder models are fine-tuned using Megatron-LM across 16 NVIDIA A100 (80GB) GPUs, with a context length of 32,768 tokens. Each batch mixes 20K SRI examples, 60K general instructions (sourced from Glaive-Code-Assistant), plus 100 safety prompts. Optimization uses AdamW (weight decay 0.1, gradient clip 1.0), linear warm-up (30 steps) to a learning rate of $5\times10^{-5}$ , followed by decay to $5\times10^{-6}$ over 853 steps. Training employs BF16 precision with a global batch size of 256 and micro-batch size 1.

5. Algorithmic Workflow

A single SRI inference is executed as below:

def SRI_infill(code_file: str) -> str:
    # 1. Insert marker if not present
    ctx = load_file(code_file)
    assert "/* MIDDLE CODE TO COMPLETE */" in ctx
    # 2. Greedily generate diff
    diff_block = model.generate(ctx, prompt=SRI_PROMPT)
    # 3. Parse SEARCH / REPLACE regions
    search_snippet, replace_snippet = parse_diff(diff_block)
    # 4. Apply patch
    return apply_replace(ctx, search_snippet, replace_snippet)

The innovation lies in step 2: the model emits both SEARCH and REPLACE in one autoregressive generation, consolidating verification and editing into a unified step.

6. Empirical Evaluation and Results

SRI was benchmarked against Base FIM and Chat-FIM paradigms using similarity-based (CrossCodeEval EM, Edit Similarity) and execution-based (Pass@1 on ExecRepoBench) metrics. Selected results:

Model	CrossCodeEval EM	Pass@1 (ExecRepoBench)
Qwen2.5-Coder-32B (FIM)	57.1%	25.7%
DeepSeek-V3-Base (FIM)	61.9%	—
Claude-3.5-Haiku (Chat-FIM)	23.9%	35.6%
Claude-3.5-Haiku (SRI)	44.5%	61.8% (+26.2%)
SRI-Coder-32B (ours)	57.6% (+46.3)	61.6% (+37.1)

SRI-Coder models fine-tuned on 20K examples matched or exceeded larger Base FIM models, and SRI tuning preserved inference latency within 1–2% of standard FIM. On MBPP, HumanEval, BigCodeBench, and LiveCodeBench, SRI-Coder exhibited negligible degradation (0–1 pt), contrasting with 3–7 pt average drops for natural-language FIM tuning. This suggests that SRI’s diff-style objective preserves general coding competencies in instruction-tuned Chat LLMs.

7. Limitations, Adoptions, and Extensions

Current evaluations are restricted to offline benchmarks, with practical IDE integration and user studies pending. Smaller models (<1B) achieve diminished SRI gains, indicating the need for knowledge distillation or curriculum strategies. Multi-file edits and richer agentic workflows integrated directly on the SRI format represent plausible directions for future code-assistant development. The SRI-200K dataset and SRI-Coder checkpoints are available under open-source licenses to support broad adoption and further research (Zhang et al., 19 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Search-and-Replace Infilling (SRI).