Flow Matching: Inference-Time Scaling

Updated 23 October 2025

The paper demonstrates architectural and algorithmic innovations—such as SDE-based sampling, interpolant conversion, and noise injection—that enable dynamic inference-time scaling in flow matching models.
It presents comprehensive empirical results showing monotonic improvements in sample quality and performance across domains like image generation and protein design with increased compute.
The paper also discusses theoretical trade-offs between maintaining FM’s efficient, straight-line transport and introducing stochasticity to boost diversity, outlining key challenges in adaptive resource allocation.

Flow matching inference-time scaling refers to methods that enable the computational budget for inference in flow matching models to be efficiently increased at test time, either for improving sample quality, enhancing posterior coverage, or trading off accuracy for efficiency in diverse scientific and engineering contexts. Unlike diffusion-based methods, where inference-time compute scaling (by increasing particle count or search budget) has become routine, flow matching was initially perceived as providing a fixed, deterministic and highly efficient generative process. Recent advances have demonstrated that the flow matching framework can be adapted to support efficient inference-time scaling through architectural, algorithmic, and mathematical innovations that preserve key properties (such as injectivity, unconstrained architectures, and straight-line transport) while allowing sample diversity, compute-aware reward maximization, or allocation of additional trajectories at test time. This article rigorously surveys the foundational concepts, primary techniques, empirical findings, and theoretical insights underpinning inference-time compute scaling in flow matching models, with a technical focus suitable for researchers and practitioners.

1. Fundamentals of Flow Matching for Generative Modeling and Inference

Flow matching (FM) is an approach for generative modeling and simulation-based inference based on training a network to regress a velocity field $v_t(x)$ so that, along a prescribed probability path (often linear interpolation or optimal transport), samples are deterministically transported from a simple base distribution (e.g., standard Gaussian prior) towards a complex target distribution (e.g., empirical data or Bayesian posterior). This is formalized via an ordinary differential equation (ODE): $\frac{d \psi_t(x)}{dt} = v_t(\psi_t(x)), \qquad \psi_0(x) = x_0$ where $x_0 \sim p_0$ , and the flow is defined to satisfy $\psi_1(x)\sim p_1$ .

The core training objective is to regress $v_t(x)$ to the true "velocity" $u_t(x)$ for the path $(p_t)$ , minimizing a squared error loss over time $t$ and interpolant states $x_t$ . For simulation-based inference, flow matching posterior estimation (FMPE) conditions the vector field on observed data $x$ : $\mathcal{L}_{\text{FMPE}} = \mathbb{E}_{t,\theta_1,x,\theta_t} \left\| v_{t,x}(\theta_t) - u_t(\theta_t|\theta_1) \right\|^2$ Efficient density computation is available via the change-of-variables formula: $q(\theta|x) = q_0(\theta)\, \exp \left[ -\int_0^1 \operatorname{div} v_{t,x}(\theta_t)\, dt \right]$ Key advantages of FM over discrete normalizing flows include architectural flexibility (no invertible constraints), tractable density evaluation, rapid training, and scalability to high-dimensional problems (Dax et al., 2023).

2. Techniques for Inference-Time Compute Scaling in Flow Matching

Traditional flow matching models implement a deterministic generative dynamic, precluding direct sample diversity or compute scaling at inference. Several approaches have generalized this, allowing for test-time compute scaling:

SDE-Based Generation: By augmenting the ODE with a diffusion term, ODE sampling (deterministic) is replaced by SDE sampling (stochastic), introducing sample diversity at each time step. The SDE is of the form:

$dx_t = u_t(x_t)dt + g_t\, dW_t$

with drift $u_t$ and diffusion $g_t$ (Kim et al., 25 Mar 2025). The marginal at each $t$ can match the original flow regardless of $g_t$ , enabling inference-time particle sampling analogous to particle-based techniques in diffusion models.

Interpolant Conversion: Linear and variance-preserving (VP) interpolants define the structure of the probability path $x_t = \alpha_t x_0 + \sigma_t x_1$ . At inference, converting from a linear to a VP interpolant via time and scale transformation broadens the search space for high-reward samples without retraining, yielding larger sample diversity and improved reward maximization (Kim et al., 25 Mar 2025).
Noise Injection and Orthogonalization: Preserving straight-line (linear) FM sampling while injecting controlled noise orthogonal to the model's learned score direction introduces additional stochasticity with minimal disruption to deterministic flow. The DMFM-ODE variant, for example, applies noise projected orthogonally to the score, with a decaying schedule $\alpha(t)$ to maintain efficiency and trajectory alignment (Stecklov et al., 20 Oct 2025).
Budget-Adaptive Resource Allocation: Rollover Budget Forcing (RBF) dynamically allocates function evaluations (NFEs) across time steps, focusing compute where it is empirically "useful" (i.e., where particle filtering or search is most likely to discover high-reward samples), rolling over unused compute to future time steps (Kim et al., 25 Mar 2025).
Noise Search (NS) and Two-Stage Refinement: An initial random search (RS) is conducted over source particles, then each trajectory is iteratively refined via a noise search (NS) procedure along the time dimension, enabling consistent stepwise quality improvements as compute is increased (Stecklov et al., 20 Oct 2025).

These techniques can be combined (e.g., VP-SDE plus RBF, or DMFM-ODE plus RS+NS), supporting monotonic improvements in sample quality as a function of test-time compute.

3. Empirical Evidence and Applications

Empirical studies demonstrate consistent benefits of inference-time compute scaling in flow matching:

Image Generation: On ImageNet 256×256 (using SiT-XL/2), applying DMFM-ODE plus RS/NS at increasing compute budgets leads to monotonic gains in FID, Inception Score, and DINO classification accuracy. Maximum improvements are observed for verifier-guided RS+NS, overtaking both random search and previous best-in-class methods (Stecklov et al., 20 Oct 2025).
Protein Design: For unconditional protein generation (FoldFlow2), inference-time scaling as compute (number of sampled trajectories) increases yields improved scTM-scores, lower RMSD, and higher fractions of designable structures, with the top results exceeding TM-score 0.9 at 8× compute (Stecklov et al., 20 Oct 2025). This confirms applicability to scientific domains.
Text-to-Image and Counting Tasks: Inference-time scaling via VP-SDE and RBF achieves higher compositional alignment and quantity-aware generation versus both linear-SDE and ODE baselines on compositional text-to-image tasks (Kim et al., 25 Mar 2025).

All reviewed approaches report that monotonic sample quality improvements are observed as a function of inference-time compute, for both visual and scientific domains.

4. Theoretical Considerations and Trade-offs

Flow matching inference-time scaling fundamentally differs from diffusion or autoregressive test-time scaling. Key distinctions include:

Linearity versus Diversity Trade-off: Modifying the sampling path (e.g., non-linear VP interpolant) increases diversity but often sacrifices FM’s straight-line property, which underpins its computational efficiency. Noise injection orthogonal to the learned score aims to achieve diversity without departing from straight transport (Stecklov et al., 20 Oct 2025). In contrast, SDE-based approaches (as in (Kim et al., 25 Mar 2025)) accept loss of straight-line efficiency for sample spread.
Computational Efficiency: The principal advantage of FM is that, when the path is preserved, sample generation remains more efficient than diffusion methods—typically requiring fewer steps. Approaches that preserve linearity (such as DMFM-ODE) maintain this efficiency across compute scales.
Reward- or Verifier-Guided Search: Methods integrating verifier functions (external quality or alignment metrics) guide both initial random search and trajectory refinement, focusing resources on high-likelihood or high-reward output regions—key for tasks such as conditional design or compositional text-to-image (Stecklov et al., 20 Oct 2025, Kim et al., 25 Mar 2025).
Limiting Factors: Excessive insertion of compute (NFEs, particles) can plateau in its benefit or may even degrade performance if noise injection or adaptive allocation mechanisms are not matched to model architecture or problem geometry. Retaining well-conditioned, straight flows helps mitigate diminishing returns.

A plausible implication is that optimal inference-time scaling for FM will require further adaptation of training procedures, verifier integration, and possibly new trajectory design heuristics.

5. Broader Impact and Future Research Directions

The methodologies for inference-time scaling in flow matching expand the practical utility of these models in several domains:

Scientific Modeling: Protein design, cell trajectory modeling, and physical simulation tasks benefit from test-time scalability combined with high fidelity and structural interpretability (Stecklov et al., 20 Oct 2025).
Language and Vision: Monotonic improvements in sample quality as compute increases at test time align FM with best practices for LLMs and diffusion models, facilitating multi-modal, reward-driven generation (Kim et al., 25 Mar 2025).
General-Purpose Generative Modeling: The introduction of SDE-based generation, trajectory design strategies, and budget allocation mechanisms allows FM to approach the expressiveness and flexibility of score-based and autoregressive methods but with enhanced efficiency.
Open Problems: Future work may paper architectures tailored for inference-time scaling (e.g., models optimized for noise/stochastic trajectory conditioning), deeper integration of verifier/reward functions within the generative process, and further improvements in adaptive compute allocation paradigms.

6. Comparative Summary

The state of inference-time compute scaling in flow matching is now marked by several practical methods and empirical successes:

Technique	Maintains FM Linearity	Sample Diversity	Test-Time Compute Scaling	Domain Demonstrated
SDE-based Generation	×	✓	✓	Vision, text-to-image (Kim et al., 25 Mar 2025)
Interpolant Conversion	× (with VP path)	✓	✓	Vision (Kim et al., 25 Mar 2025)
DMFM-ODE Noise Injection	✓	✓ (orthogonal)	✓	Vision, protein (Stecklov et al., 20 Oct 2025)
Budget Forcing (RBF)	✓ (with linear path)	✓	✓	Vision, text-to-image (Kim et al., 25 Mar 2025)
Verifier-Guided RS+NS	✓	✓	✓	Vision, protein (Stecklov et al., 20 Oct 2025)

Empirical results confirm that these approaches deliver monotonic improvements in target metrics when increasing inference-time compute, while either preserving or intelligently trading off FM’s distinctive efficiency.

7. Conclusion

Inference-time scaling for flow matching models leverages noise-injected ODE/SDE sampling, interpolant conversion, resource-aware trajectory search, and budget allocation to systematically improve sample quality or better match complex target objectives as computation is increased post-training. The newest strategies achieve this while preserving the efficiency advantages that distinguish FM from diffusion models, and successfully extend the paradigm to diverse and scientifically critical domains. The ongoing development of these techniques positions flow matching as a competitive, scalable, and widely applicable framework for generative modeling in both machine learning and scientific computing (Kim et al., 25 Mar 2025, Stecklov et al., 20 Oct 2025).

PDF Markdown Chat (Pro)

References (3)

Flow Matching for Scalable Simulation-Based Inference (2023)

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing (2025)

Inference-Time Compute Scaling For Flow Matching (2025)

Follow Topic

Get notified by email when new papers are published related to Flow Matching Inference-Time Scaling.