Fill-in-the-Middle Code Completion

Updated 2 July 2026

Fill-in-the-Middle Code Completion is a paradigm that generates missing code snippets using both prefix and suffix contexts, aligning with real-world developer workflows.
It employs syntax-aware and AST-based techniques to select complete and semantically coherent code spans, ensuring high-quality infilled code segments.
Advanced training and decoding strategies, including structured FIM and horizon-length prediction, significantly boost the accuracy and efficiency of code completion.

Fill-in-the-middle (FIM) code completion is a paradigm in code intelligence where a model is required to generate a plausible, contextually coherent code fragment given both a left (prefix) and right (suffix) context. FIM explicitly models infilling, rather than only left-to-right (L2R) prediction, and is now foundational in code LLM pretraining, inference, evaluation, and alignment. The dominance of FIM in modern code completion systems is supported by both empirical performance advantages and increased alignment with real-world developer workflows.

1. Definition, Motivation, and Task Formulations

FIM completion is formally defined over a code sequence $x = (x_0, \ldots, x_L)$ , partitioned into a prefix, a contiguous middle span, and a suffix:

$\mathrm{prefix} = (x_0, ..., x_{i-1})$
$\mathrm{middle} = (x_i, ..., x_j)$
$\mathrm{suffix} = (x_{j+1}, ..., x_L)$

The completion task is to generate $\mathrm{middle}$ given both $\mathrm{prefix}$ and $\mathrm{suffix}$ , yielding code $\mathrm{prefix} \| \mathrm{middle} \| \mathrm{suffix}$ . This setting supports editing, insertion, and completion “in the middle” of code, which aligns with how developers interact with IDEs—editing partial functions, refactoring, or filling in code blocks (Gong et al., 2024).

FIM input formatting typically employs special sentinel or boundary tokens (e.g., $\langle\text{PRE}\rangle$ , $\langle\text{SUF}\rangle$ , $\mathrm{prefix} = (x_0, ..., x_{i-1})$ 0) and may use multiple orderings:

Prefix-Suffix-Middle (PSM): $\mathrm{prefix} = (x_0, ..., x_{i-1})$ 1
Suffix-Prefix-Middle (SPM): $\mathrm{prefix} = (x_0, ..., x_{i-1})$ 2

The model is trained with an autoregressive causal objective to maximize $\mathrm{prefix} = (x_0, ..., x_{i-1})$ 3 (Jiang et al., 2024).

2. Syntax-Aware and Structure-Guided Span Selection

A central limitation of standard (random-span) FIM is that it often yields incoherent or unrepresentative training examples, as the masked span may cut across arbitrary token or character boundaries. Syntax-aware Fill-in-the-Middle addresses this by selecting the middle span via abstract syntax tree (AST) analysis to ensure that the infill region is a complete, well-formed syntactic construct (e.g., function body, loop, conditional expression).

Methods:

AST-FIM: Masking complete AST nodes or aligned multi-node spans—a mixture of node- and range-based selection ensures that all infill regions are strictly localizable subtrees (Gong et al., 30 May 2025).
SFIM (Structured FIM in aiXcoder-7B): Select internal, non-leaf AST nodes (never root, never leaf), and further align span boundaries to line endings, ensuring the infill region corresponds to a complete code snippet (Jiang et al., 2024).
SAFIM Benchmark: All evaluation spans are derived from AST-based, semantically critical code elements–algorithmic block, control-flow condition, API call–rather than random masking (Gong et al., 2024).

This structure-aware span selection is empirically shown to drive improvements in syntax validity and exact-match metrics, particularly on benchmarks targeting semantic code blocks (Jiang et al., 2024, Gong et al., 30 May 2025).

3. Training Objectives and Multi-Objective Pretraining

FIM is implemented as a multi-mode, multi-objective pretraining strategy:

Next-Token Prediction (NTP): Standard L2R causal objective $\mathrm{prefix} = (x_0, ..., x_{i-1})$ 4.
FIM Random-Span Objective: Prefix, random middle, and suffix; mask chosen uniformly at random.
Structured FIM (SFIM): Syntax-guided selection of the middle span with both PSM and SPM input orderings; dominant sampling weight for code.
AST-FIM Pretraining: Mixture of single-node and aligned-span AST masking, combined with L2R data (Gong et al., 30 May 2025).

Sampling probabilities are typically skewed to favor syntax-aware FIM for code (e.g., 70% SFIM in aiXcoder-7B), ensuring models see mostly structurally valid infilling tasks (Jiang et al., 2024).

Recent Enhancements:

Horizon-Length Prediction (HLP): Attach an auxiliary head to regress the fraction of the remaining middle at each generation step, providing planning and stop-point awareness. This helps eliminate reliance on dataset-specific truncation rules and improves suffix boundary alignment (Ding et al., 2024).
Curriculum and Context FIM: Oversample “hard” AST node types where acceptance is low, and inject relevant in-file or cross-file context via static analysis or retrieval, boosting acceptance rates of small and medium models (Sagtani et al., 2024).

4. Inference Mechanisms and Post-Processing

At inference time, several classes of decoding and post-processing are employed:

Greedy or temperature-driven decoding of the target span, conditioned on observed context.
Self-infilling with non-monotonic decoding: Introduce interruption (switch to suffix when confidence drops) and looping (alternate suffix/middle-left2right predictions) to enhance regularity and boundary control (Zheng et al., 2023).
Syntax-Aware Constrained Decoding: Incrementally parse the generated snippet (with Earley’s algorithm extended for right/left quotienting), rejecting token extensions that cannot produce a legal combined AST with the prefix and suffix (Melcer et al., 2024).
Overlap-removal and syntax-aware truncation: Especially for random-span or instruction-tuned outputs, strip maximal overlapping prefix and/or suffix fragments from the raw output to remove over-generation (Ahmad et al., 24 May 2025, Gong et al., 2024).
Instruction-Aware Infilling (IFIM): Incorporate explicit developer intent (instructions/comments) in the infilling prompt, training models to condition explicitly on a natural language instruction together with prefix and suffix (Sun et al., 29 Sep 2025).
Edit-Oriented FIM (SRI): Perform code search-and-replace tasks grounded in a single-pass inference, overcoming the optimal-context assumption and enabling edit-oriented infilling robust to buggy context (Zhang et al., 19 Jan 2026).

5. Empirical Performance and Benchmarking

Comprehensive benchmarking frameworks such as SAFIM and Real-FIM-Eval provide large-scale, syntax-aware, multilingual, and decontaminated testbeds for FIM evaluation (Gong et al., 2024, Gong et al., 30 May 2025). Key points:

Model / Method	FIM Task Type	Syntax-Aware Infills	Avg. Pass@1 / EM
aiXcoder-7B	Random & SFIM	Yes	79.3% (SantaCoder FIM)
AST-FIM (8B, 1–2T)	AST, Block, Control	Yes	+5–8 pts over Rand-FIM (SAFIM, Real-FIM-Eval)
SynthCoder-8B	AST/Heuristic, DPO	Yes	49.8% aiXcoder EM, +17.8% over base
StructureCoder	Granular AST/DPO	Yes	+1–2% pass@1 gains
HLP-FIM	Planning FIM	No (aux head)	+5–24% rel. on FIM tasks
IFIM	Instruction-aware	Yes (when provided)	84.6% $\mathrm{prefix} = (x_0, ..., x_{i-1})$ 5 93.6% on HumanEval-infilling
SRI	Edit+FIM	Yes (SRI process)	+26–46 pts on CrossCodeEval, SAFIM vs. FIM

Performance gains from FIM pretraining are robust to parameter scaling and paradigm; models with structure-aware masking, horizon planning, DPO alignment, or instruction/context augmentation consistently outperform random-FIM or L2R-only baselines, often by 5–25% relative depending on the task (Gong et al., 30 May 2025, Ding et al., 2024, Yu et al., 21 Aug 2025, Sagtani et al., 2024, Ren et al., 27 Aug 2025, Sun et al., 29 Sep 2025).

6. Operational Trade-Offs, Limitations, and Routing

FIM-based completion presents several practical trade-offs:

Latency versus quality: Large-base models yield better accuracy but are impractical for real-time IDE completion; curriculum/context and MoE architectures allow small models to approach larger-model performance at low latency (Sagtani et al., 2024).
Boundary placement: Standard FIM training lacks explicit horizon awareness, requiring rule-based truncation; HLP and syntax-aware post-processing alleviate this but add complexity (Ding et al., 2024, Gong et al., 2024).
Syntax errors: Unconstrained generation yields 25–30% more syntax-invalid infills; constrained decoding recovers nearly all of this gap at modest runtime cost (Melcer et al., 2024).
Model routing: Syntax-aware routing (SynConfRoute) combines token-level confidence and online syntax validation, escalating only syntax-invalid or low-confidence completions to larger models. This reduces accelerator usage by 58% while increasing pass@1 by 6–31% over confidence-only routing, and never rejects a correct local completion (Thangarajah et al., 6 May 2026).
Alignment and security: Classic FIM “base” models are vulnerable to prompt injection; edit-oriented variants (SRI) preserve instruction-following priors and maintain high accuracy and safety post-alignment, with minimal inference overhead (Zhang et al., 19 Jan 2026).
Generalization and transfer: Gains from structure-aware and instruction-augmented FIM persist across languages and repository-level settings, but require adequate AST support and high-quality training data (Gong et al., 30 May 2025, Gong et al., 2024, Sun et al., 29 Sep 2025).

7. Key Insights and Recommendations

Syntax-aware and AST-based masking strategies should replace random-span FIM training for both pretraining and benchmark construction, as they yield more coherent infill instances and align with actual developer edits (Jiang et al., 2024, Gong et al., 30 May 2025).
Curriculum learning targeting difficult AST node types and context retrieval for cross-file completion significantly benefit low-capacity models, reducing the gap to much larger models with minimal latency impacts (Sagtani et al., 2024).
Horizon prediction or planned-length auxiliary losses provide explicit lookahead, resolving the open task of boundary alignment in open-domain FIM; these additions have negligible training and inference cost (Ding et al., 2024).
Instruction-aware FIM manages the trade-off between developer intent-following and pure code infilling, enabling models to leverage comments/instructions without degrading core FIM capabilities (Sun et al., 29 Sep 2025).
Direct Preference Optimization with granular, structure-aligned infill regions, and adversarial negative sampling (especially suppression of prefix/suffix repetition) ensures better alignment and reduces pathological completion behaviors (Yu et al., 21 Aug 2025, Ren et al., 27 Aug 2025).
Syntax-aware routing of completions in distributed pipelines enables cost-efficient, local-first deployments without accuracy trade-offs or risk of silent rejection of correct outputs (Thangarajah et al., 6 May 2026).

These characteristics position FIM as the definitive paradigm for context-aware code completion, and continued advances in syntax-awareness, planning, and alignment will determine the trajectory of next-generation code LLM capabilities.