Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fill-In-the-Middle (FIM) Modeling

Updated 25 March 2026
  • Fill-In-the-Middle (FIM) is a neural language modeling paradigm that predicts missing interior spans using both prefix and suffix contexts across diverse applications.
  • It leverages specialized tokenization, AST-based masking, and KV-cache optimizations to enhance performance and efficiency in code completion and document editing.
  • FIM integrates instruction-aware techniques and boundary planning to improve infilling precision in domains such as natural language processing, protein design, and structured code generation.

Fill-In-the-Middle (FIM) is a neural language modeling paradigm that generalizes classic left-to-right generative objectives by training a model to predict and generate a contiguous span removed from the interior of a sequence, conditioned on both its left (“prefix”) and right (“suffix”) contexts. In code, natural language, and even protein sequence modeling, FIM provides a direct mechanism to solve infilling tasks—such as code completion, document editing, or reasoning step augmentation—where future and past context are simultaneously salient. FIM is now integral to state-of-the-art code LLMs and is implemented at scale for both synthetic and real-world workflows.

1. Formal Definition and Training Objective

Let a sequence X=(x1,,xn)X = (x_1, \dots, x_n) be partitioned by indices 1a<bn1\leq a < b \leq n into three spans:

  • Prefix P=(x1,...,xa)P = (x_1, ..., x_a)
  • Middle M=(xa+1,...,xb)M = (x_{a+1}, ..., x_b) (the “hole” to infill)
  • Suffix S=(xb+1,...,xn)S = (x_{b+1}, ..., x_n)

The FIM modeling task is to learn the conditional distribution

Pθ(MP,S)P_\theta(M \mid P, S)

where θ\theta are model parameters. The typical data transformation for a decoder-only transformer appends sentinel tokens to demarcate spans, yielding the prompt: PREPSUFSMID\langle\mathrm{PRE}\rangle\,P\,\langle\mathrm{SUF}\rangle\,S\,\langle\mathrm{MID}\rangle and seeks to autoregressively generate MM token by token. The cross-entropy loss is minimized over MM, formally: maxθE(P,M,S)D[logPθ(MP,S)]\max_\theta\,\, \mathbb{E}_{(P, M, S) \sim \mathcal{D}} \,\, \left[\log P_\theta(M \mid P, S)\right] where D\mathcal{D} is the corpus of (P,M,S)(P, M, S) triplets. Interleaving FIM-structured and ordinary left-to-right (L2R) training retains both autoregressive sequence modeling and infilling capabilities (Bavarian et al., 2022, Guo et al., 2024).

2. Architectures, Prompt Formats, and Operational Regimes

FIM is natively implemented in decoder-only transformer architectures. Prompt engineering is critical for span demarcation and cache management:

  • Tokenization/Delimiters: FIM utilizes special tokens (e.g., <PRE>, <SUF>, <MID>) to mark prefix, suffix, and the start of the middle span (Guo et al., 2024, Bavarian et al., 2022).
  • Prompt Rearrangement: The dominant format is Prefix-Suffix-Middle (PSM), but Suffix-Prefix-Middle (SPM) is also used for inference/serving efficiency (Bavarian et al., 2022, Guo et al., 28 May 2025). A 50/50 PSM+SPM mix provides broad compatibility.
  • KV-cache Reuse: The EFIM prompt rearrangement enables maximal reuse of key-value (KV) cache by placing only user-updated increments after static contexts. Simultaneously, fragment-tokenization retraining resolves subtoken-generation at arbitrary boundaries, improving latency by up to 52% and throughput by 98% without loss of infilling performance (Guo et al., 28 May 2025).
  • Instruction Augmentation: The Instruction-Aware FIM (IFIM) framework extends the input with a structured instruction (quadruple (P,I,S,M)(P, I, S, M)), resulting in

PREPSUFSINSIMID\langle\mathrm{PRE}\rangle P \langle\mathrm{SUF}\rangle S \langle\mathrm{INS}\rangle I \langle\mathrm{MID}\rangle

and trains the model to incorporate developer intent (Sun et al., 29 Sep 2025).

3. Specialized FIM Strategies and Domain Adaptations

FIM has evolved with structural and contextual enhancements across multiple tasks:

  • Structure-Aware FIM: Masking entire Abstract Syntax Tree (AST) subtrees (as opposed to random tokens/chars) aligns masked spans with semantically meaningful code constructs. This structurally coherent masking (AST-FIM) delivers up to +7 Pass@1 gain over random-character FIM on standard code infilling benchmarks, and matches human editing patterns (Gong et al., 30 May 2025).
  • Curriculum and Code Context: Incorporating context and hard-to-complete code patterns (curriculum learning) enhances FIM performance, especially for smaller models. Statistics from fine-tuning on curriculum and context-rich datasets report improvements in Pass@1, Prefix Match, and edit similarity on multi-line infilling and CCEval (Sagtani et al., 2024).
  • Instruction-Conditioned FIM: IFIM achieves double-digit Pass@1 gains (e.g., Deepseek-Coder: 84.6% to 93.6% on IHumanEval) on instruction-guided infilling, with no loss (even improvement) of core FIM capabilities when instructions are absent. Physically separated instruction tokens (not comments) are critical for accurate instruction following (Sun et al., 29 Sep 2025).
  • Horizon Planning: By augmenting the next-token loss with a horizon-length regression objective (HLP), models internalize the “distance-to-suffix” at each infilling step, boosting alignment with infilling boundaries and improving repository-level and file-level pass rates by up to 24% relative, obviating the need for heuristic post-processing (Ding et al., 2024).
  • Byte-Level Decoding: Precise handling of mid-token boundaries in random-span infilling is resolved by exact byte-level marginalization over all tokenizations, yielding absolute pass rate gains of ~18% over token-level decoding (Phan et al., 2024).

4. Evaluation Protocols and Benchmarks

FIM evaluation metrics center on syntax, semantics, and boundary control:

5. Empirical Findings and Best Practices

A cross-paper synthesis yields these high-level insights:

  • Infilling does not degrade L2R: FIM pretraining at moderate rates (\leq0.5) does not harm left-to-right performance and is “free” in terms of perplexity and sample quality on L2R tasks (Bavarian et al., 2022, Guo et al., 2024, Gong et al., 30 May 2025).
  • Boundary Awareness Is Central: Post-processing of generated output (to remove extraneous lines or ensure alignment with suffix) is necessary for random-span infilling, but superfluous for line-aligned tasks when FIM is trained with explicit span boundaries (Ahmad et al., 24 May 2025, Ding et al., 2024).
  • FIM is critical for context-sensitive code completion: Models lacking FIM objectives underperform even when scaling up, and data quality in pretraining (syntax/alignment, AST-aware masks) outweighs raw parameter count (Gong et al., 2024, Gong et al., 30 May 2025).
  • AST-based masking and curriculum: Realistic structure masking converges faster and achieves higher accuracy than random spans; curriculum and context add synergy (Sagtani et al., 2024, Ren et al., 27 Aug 2025).
  • Instruction integration: IFIM, with explicit special-token instructions, closes the gap between code LLMs and natural developer workflows, far outperforming comment-based or inline instruction schemes (Sun et al., 29 Sep 2025).
  • Domain transfer: FIM supports protein design (recovering mid-chain amino acids; ProtFIM matches or outperforms larger CLM/PLM baselines (Lee et al., 2023)), math reasoning step expansion (MathFimer consistently lifts benchmark scores by up to 8 percentage points; (Yan et al., 17 Feb 2025)), and general text tasks (FiLM; (Shen et al., 2023)).
Model/Paper Domain FIM Variant Notable Result(s)
DeepSeek-Coder (Guo et al., 2024) Code PSM (50%) SOTA open-source infilling
IFIM (Sun et al., 29 Sep 2025) Code Instruction-aware FIM +9 to +12 pp Pass@1
AST-FIM (Gong et al., 30 May 2025) Code AST-structure masking +4–7 pts pass@1 over Rand-FIM
ProtFIM (Lee et al., 2023) Protein [PRE]/[SUF]/[MID] FIM Outperforms 2–30x CLMs
FiLM (Shen et al., 2023) Text Any-order masked infilling +5–14 ROUGE-PPL gap vs AR
EFIM (Guo et al., 28 May 2025) Code KV-cache-optimized FIM –52% latency, +98% throughput
MathFimer (Yan et al., 17 Feb 2025) Math Reason Step-infill in solution chain Up to +8pp on GSM8K/MATH

6. Limitations, Extensions, and Future Directions

FIM is robust but has known boundaries and open research directions:

  • Contextual Repair: Standard FIM cannot correct errors in the conditioning context (prefix/suffix). Methods like SRI (Search-and-Replace Infilling) internalize editing/verification cycles, enabling bug-fixing in the context at FIM-level latency (Zhang et al., 19 Jan 2026).
  • Syntax Guarantee: Unconstrained decoders still admit syntax errors. Left/right quotient-based constrained decoding using context-sensitive grammars can boost syntactic correctness from 65%→89.5% in Python FIM, with minor inference overhead (Melcer et al., 2024).
  • Subtoken and Byte Handling: Fragment-tokenization and byte-level marginalization remove pitfalls near token boundaries, markedly improving random-span fill (Guo et al., 28 May 2025, Phan et al., 2024).
  • Post-processing: Needed only for random/partial-line tasks; high-quality FIM + supervised fine-tuning yields models that learn exact output boundaries (Ahmad et al., 24 May 2025).
  • Scaling Laws: FiLM and AST-FIM show that the infilling–autoreg gap shrinks at scale and with code-structure alignment, suggesting further gains with increased compute or bidirectional generation (Shen et al., 2023, Gong et al., 30 May 2025).
  • Expanded Curriculum, Context, and Instructions: Combining structural, context-aware, and instruction-rich examples is essential for high infilling accuracy, persistence, and human alignment (Sagtani et al., 2024, Sun et al., 29 Sep 2025, Ren et al., 27 Aug 2025).

7. Impact, Applications, and Best Practices

FIM is now standard in foundation code LLMs, code assistants, and editing tools. Key best-practices distilled from the literature include:

Fill-In-the-Middle thus provides a general, extensible, and empirically validated paradigm for sequence infilling across domains, with structural, efficiency, and instruction-following enhancements emerging as the main determinants of state-of-the-art performance (Sun et al., 29 Sep 2025, Ding et al., 2024, Gong et al., 30 May 2025, Ren et al., 2024, Sagtani et al., 2024, Zhang et al., 19 Jan 2026, Ren et al., 27 Aug 2025, Gong et al., 2024, Guo et al., 28 May 2025, Ahmad et al., 24 May 2025, Phan et al., 2024, Melcer et al., 2024, Lee et al., 2023, Shen et al., 2023, Yan et al., 17 Feb 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fill-In-the-Middle (FIM).