Papers
Topics
Authors
Recent
Search
2000 character limit reached

FineInstructions: Code Completion Paradigm

Updated 3 July 2026
  • FineInstructions is an instruction-aware fill-in-the-middle paradigm that integrates explicit developer instructions with prefix and suffix code context to improve infill accuracy.
  • The approach employs a dedicated instruction token to clearly delimit intent, ensuring models respond accurately to natural language guidance.
  • Empirical results demonstrate that IFIM achieves an 8–10 percentage-point improvement on benchmarks like IHumanEval and IRME while maintaining robust baseline performance.

Instruction-Aware Fill-in-the-Middle (IFIM) Paradigm for Code Completion

Instruction-aware Fill-in-the-Middle (IFIM) is a code completion paradigm designed to bridge the gap between a developer’s natural language instructions and effective “fill-in-the-middle” (FIM) modeling in LLMs. Traditional FIM approaches leverage code context—prefix and suffix—to predict missing segments but often fail to integrate explicit developer intent, especially when code context is ambiguous. IFIM addresses this shortfall by structurally incorporating instruction spans, allowing models to significantly improve their responsiveness to developer guidance while maintaining robust baseline infill performance in the absence of instructions (Sun et al., 29 Sep 2025).

1. Formal Objective and Model Structure

1.1 Conventional FIM

The standard FIM training objective splits each code completion instance into three contiguous token spans:

  • PP: Prefix (tokens before the edit point)
  • MM: Middle (the region to be predicted)
  • SS: Suffix (tokens after the edit)

The model is trained to maximize the conditional likelihood

LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]

where D\mathcal{D} is the code infilling dataset.

1.2 IFIM Extension

IFIM extends this to quadruplets by introducing an explicit instruction span II: LIFIM=E(P,I,S,M)D[logpθ(MP,I,S)]\mathcal{L}_\mathrm{IFIM} = -\mathbb{E}_{(P, I, S, M) \sim \mathcal{D}} [\log p_\theta(M \mid P, I, S)] Combining IFIM and FIM samples in training yields: L=αLIFIM+(1α)LFIM,α[0,1]\mathcal{L} = \alpha\,\mathcal{L}_{\rm IFIM} + (1-\alpha)\,\mathcal{L}_{\rm FIM},\quad \alpha \in [0,1] A dedicated instruction token (<<INS>>) delimits MM0 to ensure architectural consistency and avoid head reinitialization.

2. Data Generation Pipeline

2.1 Sourcing and Preparation

  • Middle span MM1: Sampled as 1–3 contiguous code lines from open-source repositories.
  • Surrounding context: Remaining lines become MM2 (prefix) and MM3 (suffix).

2.2 Automated Instruction Synthesis

  • GPT-4o is prompted by marking MM4 with MM5explainMM6…MM7/explainMM8 and tasked to generate a single, concise, intent-focused sentence (average ≈10 tokens) describing the purpose and intent of MM9.
  • Only one-sentence instructions are retained. Overlap with standard public code completion datasets (HumanEval, MBPP) is removed.

Final dataset scale: 122,900 samples, 70% Python, average instruction ≈10 tokens.

3. Model Architecture, Input Ordering, and Training

  • Base LLMs: Deepseek-Coder (6.7B; default FIM mode PMS) and Qwen2.5-Coder (7B; default FIM mode PSM).
  • The SS0INSSS1 token denotes the instruction boundary, leveraging low-frequency vocabulary entries to avoid architectural disruption.
  • Empirically, input orderings with “I-before-M” maximize performance:
    • Deepseek: SS2PRESS3 SS4 SS5INSSS6 SS7 SS8MIDSS9 LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]0 LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]1SUFLFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]2 (“PIMS”)
    • Qwen2.5: LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]3PRELFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]4 LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]5 LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]6SUFLFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]7 LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]8 LFIM=E(P,M,S)D[logpθ(MP,S)]\mathcal{L}_\mathrm{FIM} = -\mathbb{E}_{(P, M, S) \sim \mathcal{D}} [\log p_\theta(M \mid P, S)]9INSD\mathcal{D}0 D\mathcal{D}1 D\mathcal{D}2MIDD\mathcal{D}3 (“PSIM”)

Hyperparameters

  • Framework: Huggingface Transformers + PyTorch, 2×A6000 GPUs.
  • Optimizer: Adafactor, learning rate D\mathcal{D}4, 15 warmup steps, linear decay.
  • Deepseek-Coder: batch D\mathcal{D}5 (token accumulation), context 1216 tokens.
  • Qwen2.5-Coder: batch D\mathcal{D}6, context 1216 tokens.
  • Training: 2 epochs (D\mathcal{D}7100,000 steps). D\mathcal{D}8 (100% IFIM) optimal for instruction following; 25–75% mixtures explored for ablation.

4. Evaluation Protocol, Benchmarks, and Results

4.1 Benchmarks & Metrics

  • IHumanEval: 312 Python infilling problems with docstrings removed.
  • IRepoMasterEval (IRME): 256 file-level code infill tasks, context truncated to 20 lines both sides.
  • Primary metric: Pass@1 (proportion of completions passing all unit tests).

4.2 Empirical Results

Model Setting IHumanEval IRME
Deepseek-base w/ ins. 84.6% 10.9%
Deepseek-IFIM w/ ins. 93.6% 21.1%
Deepseek-base w/o ins. 68.6% 7.4%
Deepseek-IFIM w/o ins. 78.2% 16.0%
Qwen2.5-base w/ ins. 91.0% 18.4%
Qwen2.5-IFIM w/ ins. 95.8% 20.3%
Qwen2.5-base w/o ins. 76.0% 10.2%
Qwen2.5-IFIM w/o ins. 76.3% 13.3%
  • IFIM yields D\mathcal{D}9 to II0 percentage-point improvements on IHumanEval and II1 to II2 on IRME when instructions are provided.
  • IFIM additionally raises performance in no-instruction settings (e.g., Deepseek: II3 on IHumanEval, II4 on IRME).
  • “I-before-M” ordering outperforms alternatives by 3–5 points.
  • Mixing in standard FIM samples (II5 between 0.25–0.75) can better preserve baseline infilling when no instructions are given, but pure IFIM (II6) is optimal for instruction following.

CFIM ablation: Inlining instructions as comments (CFIM) severely degrades performance (e.g., Deepseek: 4.3% on IRME), underscoring the necessity of an explicit II7INSII8-delimited instruction span.

5. Design Implications, Limitations, and Best Practices

  • Effective IFIM-derived “FineInstructions” should be a single, clear sentence (5–15 tokens), describing what to achieve in the missing code (not how).
  • Use an explicit delimiter (e.g., II9INSLIFIM=E(P,I,S,M)D[logpθ(MP,I,S)]\mathcal{L}_\mathrm{IFIM} = -\mathbb{E}_{(P, I, S, M) \sim \mathcal{D}} [\log p_\theta(M \mid P, I, S)]0 or IDE-friendly #! ...) for instructions, enabling seamless post-completion removal.
  • The IFIM dataset as built is Python-focused; testing cross-language robustness and scaling to larger model sizes (30B+) are essential next steps.
  • Harvesting “wild” data (e.g., inline developer comments, logs) is promising but requires robust filtering and intent extraction.

6. Impact and Prospects

IFIM provides a backward-compatible, instruction-aware extension of standard FIM pretraining for code LLMs (Sun et al., 29 Sep 2025). It delivers substantial (>8 percentage-point) gains in following finely specified developer intent, while preserving or improving the model’s performance in vanilla infilling scenarios lacking explicit instructions. The approach reconciles the historical trade-off—imposed by standard instruction tuning—between infilling competence and instruction adherence in code completion systems, establishing a new state of the art on both synthetic and real-world programming benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FineInstructions.