Reasoning-as-Infilling

Updated 27 October 2025

Reasoning-as-Infilling is a paradigm that reinterprets reasoning as the process of adaptively filling in missing or intermediate details using bidirectional contextual cues.
It leverages techniques such as incremental approximation, segment-aware self-attention, and gradient-based refinement to improve coherence in diverse domains like theorem proving and code generation.
This approach enables practical improvements in text and code generation, offering enhanced data efficiency, logical consistency, and multimodal reasoning capabilities.

Reasoning-as-Infilling is a paradigm that conceptualizes reasoning processes as the adaptive “infilling” of missing, intermediate, or indeterminate parts of a solution, proof, document, narrative, or latent representation. In this framework, automated systems produce and refine partial results—be they text, code, tokens, logical steps, or latent variables—while incrementally integrating context from both observed and unobserved elements. Reasoning-as-infilling departs from strictly sequential, left-to-right, or one-shot approaches by making use of surrounding contextual cues, intermediate updates, or structured templates to “fill the gaps” in complex inferential tasks across domains such as theorem proving, language modeling, code completion, vision, and knowledge representation.

1. Foundational Principles

The reasoning-as-infilling approach draws upon several foundational insights:

Incremental Approximation: Reasoning is formulated as an incremental, resource-aware process in which partial results are progressively refined as more computation or information becomes available (Horvitz et al., 2013).
Bidirectional Context Integration: Unlike strictly left-to-right or unidirectional reasoning, infilling leverages information available in both preceding and succeeding context to make more globally coherent inferences (Zhu et al., 2019, Fried et al., 2022).
Partial Evidence and Belief Updates: Bayesian and decision-theoretic principles are used to update beliefs about a conclusion or hypothesis given partial progress, enabling timely decisions even before full resolution (Horvitz et al., 2013).
Non-monotonic Generation: Infilling accommodates non-monotonic reasoning processes where outputs are repeatedly revised, refined, or synchronized between different parts of the solution (Zheng et al., 2023).

This generalization allows for the modeling of reasoning as an “infilling” process within diverse architectures, including flexible algorithms, self-attention Transformers with segment-aware position encoding, masked diffusion models, and iterative latent refinement modules.

2. Algorithmic Formulations and Modeling

A wide variety of algorithmic formulations have been developed under the reasoning-as-infilling paradigm:

Flexible Inference and Decision-Theoretic Metareasoning

In the context of resource-limited theorem proving, flexible inference algorithms supply partial results that can incrementally improve with added computation. The decision-theoretic net expected value of computation (NEVC) quantifies when to halt reasoning versus continue deliberation:

$\mathrm{NEVC}(S_i, I, r) = u_o(\mathrm{TT}(I)) \cdot p(\mathrm{TT}(I){\mid}S_i, I, r) - u_o(I) - u_i(r)$

where $u_o$ is object-level utility, $u_i$ is computation cost, and $p$ is the updated confidence in the target theorem (Horvitz et al., 2013).

Self-Attention with Segment-Aware Infilling

Modern text and code infilling models use self-attention over the entire contextual window, augmented with segment-aware positional encodings for distinguishing multiple blanks or infilling sites. The segment position is computed as:

$\mathrm{pos} = \mathrm{seg\_id} \times \mathrm{base} + \mathrm{offset\_id}$

and sinusoidal or learned embeddings are derived accordingly (Zhu et al., 2019, Fried et al., 2022).

Gradient-Based and Masked Diffusion Infilling

Gradient search-based infilling treats missing tokens or spans as continuous embeddings, which are iteratively updated to minimize the negative log-likelihood over the reconstructed sequence, alternating between optimization (O-step) and discretization (P-step) (Liu et al., 2019). In masked diffusion LLMs, reasoning tokens and answer tokens occupy separate masked segments, and multi-token entropy decoding (MED) infills them adaptively based on entropy thresholds:

$\mathrm{KL}(p_\theta(x^A|x_{\text{unmasked}}, c), \prod_{i \in A}p_\theta(x^i|x_{\text{unmasked}}, c)) \leq \sum_{i \in A} H(x^i|x_{\text{unmasked}}, c)$

(Horvitz et al., 22 Oct 2025).

In structurally complex tasks, such as CSPs or many-step reasoning, iterative latent variable refinement is applied. The state evolves via:

$z^{(n+1)} = f(z^{(n)}, x)$

for reflective representation learning, and—after converging—via self-refinement where $x$ is dropped:

$z^{(t+1)} = f_s(z^{(t)}, 0)$

(Deng et al., 9 Oct 2025).

3. Applications Across Domains

Theorem Proving and Mathematical Reasoning

Incomplete proofs are used to update beliefs about mathematical truth, supporting timely action under resource constraints (Horvitz et al., 2013).
Fill-in-the-middle (FIM) expansion of intermediate steps in math problem–solving chains improves accuracy by constructing richer, more granular solution traces (Yan et al., 17 Feb 2025).
Masked diffusion models structure outputs into reasoning and answer slots, enabling controlled uncertainty estimation, early exit, and post-hoc sampling of alternative reasoning traces (Horvitz et al., 22 Oct 2025).

Text and Code Generation

Bidirectional infilling enables models to fill arbitrary gaps in documents or code, essential for editing, synthesis, and repair (Zhu et al., 2019, Fried et al., 2022).
Character- and line-level constraints (e.g., FIM-SE) address inherent pitfalls of token-level infilling, eliminating fragmentary outputs and sub-token errors (Ren et al., 27 May 2024).
Self-infilling with non-monotonic and looping mechanisms allows iterative refinement of generated code and reasoning steps, improving logical consistency and regularity (Zheng et al., 2023).

Ontology Completion and Commonsense Induction

Interpolation operators "fill in" plausible properties of intermediate concepts in ontologies (e.g., inferring that the “Zebra” is a herbivore given “Rabbit” and “Giraffe” are) based on feature-sharing or geometric convex hulls (Ibáñez-García et al., 2020).
Model-theoretic and geometric semantics rigorously formalize the inductive step, integrating infilling with classical deductive ontological reasoning.

Multimodal and Visual Reasoning

Visual narrative infilling generates missing steps in procedural or story sequences given incomplete image or keyframe context by leveraging the overlap and dependencies in neighboring content (Chandu et al., 2020, Himakunthala et al., 2023).
Video diffusion models (RaMViD) “infill” future or intermediate frames from sparse observations, learning temporally coherent generative dynamics from partial conditioning (Höppe et al., 2022).

Tool-Integrated Reasoning

Reasoning chains are “infused” with external tool outputs, where the LLM interleaves natural language with executable steps (e.g., code, symbolic math) and infills the solution trace with tool-generated, deterministic results. Efficiency is quantified using metrics like Performance-Aware Cost (PAC) and Area Under the Performance-Cost Curve (AUC-PCC) (Zhao et al., 21 Aug 2025).

4. Theoretical and Computational Insights

Reasoning-as-infilling surfaces several salient theoretical properties:

Data Efficiency and Locality: Step-by-step infilling achieves greater data efficiency and reduced prediction bias when training data is locally structured—i.e., when only local clusters of variables co-occur—by leveraging intermediate variables as scaffolds that bridge non-adjacent dependencies and reduce the “reasoning gap” (Prystawski et al., 2023).
Latent Space Complexity: Satisfying the dense constraints inherent in reasoning tasks (e.g., Sudoku) requires infilling over exponentially large, interdependent latent spaces, which can be addressed by iterative alignment and refinement modules (Deng et al., 9 Oct 2025).
Computational Trade-offs: For masked diffusion LMs, parallel decoding brings efficiency gains (MED can achieve up to 2.7× speed-up) while explicitly structuring reasoning and answer slots enables early stopping based on entropy thresholds (Horvitz et al., 22 Oct 2025).
Model-Agnosticism: Gradient-based inference methods for infilling can be applied to a wide spectrum of generative architectures without retraining, further demonstrating the generality of the reasoning-as-infilling paradigm (Liu et al., 2019).

5. Evaluation, Benchmarks, and Practical Impact

Numerous empirical evaluations have established the utility and impact of reasoning-as-infilling:

Text and Code Benchmarks: Improved BLEU, perplexity, and pass@k rates across text infilling, code synthesis, and step-by-step reasoning tasks (Zhu et al., 2019, Fried et al., 2022, Zheng et al., 2023, Ren et al., 27 May 2024).
Math and Logic Tasks: Fine-tuning on posterior reasoning traces generated via infilling matches (and sometimes exceeds) the performance obtained by fine-tuning on human-written chains (Yan et al., 17 Feb 2025, Horvitz et al., 22 Oct 2025).
Video and Speech: In visual procedures and slot-filling for speechLLMs, infilling-based reasoning yields higher narrative coherence, improved METEOR, ROUGE $_\ell$ , or F1 scores, and greater robustness in multi-modal completion tasks (Chandu et al., 2020, Himakunthala et al., 2023, Hacioglu et al., 22 Oct 2025).
Tool-Integrated and Causal Reasoning: PAC and AUC-PCC metrics demonstrate reductions in redundant computation and “overthinking” when infilling leverages tools to inject deterministically computed reasoning steps (Zhao et al., 21 Aug 2025). In complex causal reasoning, dense latent refinement delivers gains with an order of magnitude fewer parameters than previous models (Deng et al., 9 Oct 2025).

6. Challenges and Future Directions

Key open challenges and research directions include:

Logical Consistency: Ensuring that infilled reasoning steps are not only fluent but also logically sound and domain-aligned, potentially requiring integration with logic modules or consistency constraints (Zhu et al., 2019, Zheng et al., 2023).
Evaluation Metrics: Developing metrics that capture not just surface-level similarity but also logical validity, step completeness, and robustness of the infilled reasoning chain or solution structure (Himakunthala et al., 2023).
Scalability in Structured Domains: Efficiently extending infilling techniques to large or heterogeneous spaces such as dense ontologies, high-resolution video, or massive latent spaces (Ibáñez-García et al., 2020, Höppe et al., 2022, Deng et al., 9 Oct 2025).
Adaptive Decoding and Efficiency: Further advances in adaptive decoding (e.g., entropy-aware, hybrid sampling) are required for balancing inference speed with accuracy when using general infilling architectures (Horvitz et al., 22 Oct 2025).
Hybrid Multimodal Reasoning: Cross-modal applications of reasoning-as-infilling, combining vision, speech, and text, remain an emerging and fertile area (Chandu et al., 2020, Hacioglu et al., 22 Oct 2025).
Generalization and Domain-Specificity: Tailoring infilling and step expansion methods to preserve both general language capabilities and sophisticated domain-specific reasoning in hybrid or multi-domain LLMs (Hacioglu et al., 22 Oct 2025).

7. Representative Mathematical Formulations and Patterns

The following concretize the reasoning-as-infilling paradigm:

Formulation	Description	Reference
$\mathrm{NEVC}(S_i, I, r)$	Net expected value of computation for flexible inference	(Horvitz et al., 2013)
$\mathrm{pos} = \mathrm{seg\_id} \times \mathrm{base} + \mathrm{offset\_id}$	Segment-aware position encoding in infilling	(Zhu et al., 2019)
$L_{\text{NLL}}(x, y^*)$	Gradient-based NLL optimization for missing tokens	(Liu et al., 2019)
$p(x, y) = \int p(z) p_g(x, y\|z) \mathbb{I}\{S(z)=1\} dz$	Joint distribution in causal selection mechanisms	(Deng et al., 9 Oct 2025)
$\hat{q}_S(Y_i=y_i\|Y_j=y_j) = \frac{1}{M} \sum_{k=1}^M q(Y_i=y_i \| Y_S = y_S^{(k)}, Y_j=y_j)$	Scaffolded generation estimator	(Prystawski et al., 2023)
$p(\text{prefix, suffix, middle}) = p(\text{prefix}) p(\text{suffix}\|\text{prefix}) p(\text{middle}\|\text{prefix}, \text{suffix})$	Joint FIM objective	(Zheng et al., 2023)

These patterns, techniques, and empirical findings delineate reasoning-as-infilling as a unifying paradigm that operationalizes stepwise, context-aware reasoning through the lens of flexible, non-linear, and partially observable computation. This approach now underpins advances across natural language, code, mathematical, multimodal, and logical reasoning domains.