Stochastic Interpolant Training

Updated 30 June 2026

Stochastic interpolant training is a framework that learns continuous, measure-preserving bridges between probability distributions by minimizing quadratic losses on time-dependent velocity and score fields.
It employs flexible parameterizations like Bézier curves to enforce boundary and monotonicity constraints, enhancing both model expressivity and computational efficiency.
Applications range from generative modeling and scientific forecasting to covariance estimation and manifold learning, with strong theoretical guarantees on error bounds and bias-variance trade-offs.

Stochastic interpolant training defines a principled and highly flexible paradigm for learning continuous, measure-preserving bridges between probability distributions, with applications spanning generative modeling, statistical estimation, scientific forecasting, and multitask learning. Central to this approach is the specification and learning of stochastic interpolation processes that generalize and unify the time evolution of flow- and diffusion-based models through the minimization of simple quadratic objectives. Theoretical developments, parameterization schemes, bias-variance trade-offs, geometric extensions, and empirical advances establish stochastic interpolant training as a foundational mechanism across modern probabilistic modeling.

1. Mathematical Foundations of Stochastic Interpolant Training

Let $p_0$ and $p_1$ denote source and target distributions in a Hilbert space $H$ , typically $\mathbb{R}^d$ . A stochastic interpolant (SI) is a family of random variables of the form

$x_t = \alpha(t) x_0 + \beta(t) x_1 + \gamma(t) z, \quad t \in [0,1]$

where $x_0 \sim p_0$ , $x_1 \sim p_1$ , $z \sim \mathcal{N}(0, I)$ is independent, and the schedules $\alpha, \beta, \gamma$ are continuous functions with boundary conditions $\alpha(0) = 1$ , $p_1$ 0, $p_1$ 1, $p_1$ 2, $p_1$ 3. This framework admits both deterministic (ODE-based) and stochastic (SDE-based) generative processes, unified via the continuity or Fokker–Planck equations governing the evolution of the time-marginal density $p_1$ 4 (Albergo et al., 2023, Zhou et al., 30 Sep 2025).

The core training objective is to estimate time-dependent velocity and/or score fields by minimizing quadratic (mean-squared error) losses derived from conditional expectations or implicit score-matching. In the two-marginal case, the velocity admits the closed form

$p_1$ 5

and the score is related via functional identities, e.g.,

$p_1$ 6

with precise decompositions depending on the choice of schedules and domains (Li et al., 26 Sep 2025, Ma et al., 2024). The SI theory extends naturally to multimarginal settings using simplex coordinates, and to operator-parameterized cases where $p_1$ 7 are linear operators or matrices, enabling channel-wise or spatially structured interpolants (Negrel et al., 6 Aug 2025, Albergo et al., 2023).

2. Parameterization and Scheduler Design

The parameterization of interpolant schedules is central for coupling flexibility, expressiveness, and enforcement of boundary/monotonicity constraints. BézierFlow introduces a Bézier-curve-based scheduler scheme, representing $p_1$ 8 as $p_1$ 9-degree Bézier curves:

$H$ 0

with the Bernstein basis $H$ 1 and boundary-constrained control points $H$ 2. Monotonicity of SNR $H$ 3 is enforced by an increasing sequence of interior control points via cumulative softmax parameterization:

$H$ 4

where

$H$ 5

Smoothness (differentiability) follows from the analyticity of Bézier polynomials (Min et al., 15 Dec 2025).

This parameterization broadens the search space well beyond discrete ODE timesteps, allows explicit control over trajectory shape (e.g., monotonic SNR, convexity), and is efficiently trained and deployed.

3. Training Objectives and Algorithmic Procedures

Stochastic interpolant training, regardless of parameterization, is built upon quadratic losses that admit unbiased estimation via Monte Carlo sampling of joint endpoints and interpolant states. The prototypical objective for velocity learning takes the form:

$H$ 6

where $H$ 7 is the time derivative of the interpolant, evaluated at sampled $H$ 8 and $H$ 9. Score matching is handled analogously. In multimarginal or operator-valued settings, the drift fields are learned for all relevant indices or operator pairs, maintaining task agnosticism (Negrel et al., 6 Aug 2025, Albergo et al., 2023).

A summary of algorithmic choices appears in the following table:

Aspect	Approach	References
Scheduler parameterization	Bézier curves, polynomials, operator paths	(Min et al., 15 Dec 2025, Negrel et al., 6 Aug 2025)
Loss function	Quadratic regression, score matching	(Ma et al., 2024, Zhou et al., 30 Sep 2025)
Target process	ODE / SDE, including post hoc diffusion reweighting	(Min et al., 15 Dec 2025, Chen et al., 2024)
Architectural backbones	U-Net, Transformer, MLP, convolutional nets	(Ma et al., 2024, Horowitz et al., 22 Oct 2025)

Optimization is often performed with AdamW or RMSProp, typical batch sizes range from 30 to 1024, and per-iteration cost is dominated by network forward passes or (in kernelized forms) linear algebra in the feature dimension (Coeurdoux et al., 23 Feb 2026). Gradient clipping, time-embedding techniques (FiLM layers, sinusoidal encoding), and antithetic noise sampling further stabilize and regularize training.

4. Empirical Properties and Theoretical Guarantees

Stochastic interpolant training underlies provable properties regarding memorization, bias-variance control, and risk bounds. In the finite-sample case, deterministic generation reproduces elements from the empirical distribution exactly, while stochastic generation yields training samples corrupted by Gaussian noise. Estimator error regimes interpolate between overfitting (memorization), balanced smoothing, and underfitting (output drift) (Li et al., 26 Sep 2025). When generalized to parameterized velocity fields, the approximation error controls empirical risk, as in covariance shrinkage via stochastic interpolants (Chalvidal et al., 5 Jun 2026). Theoretical risk bounds take the form:

$\mathbb{R}^d$ 0

where $\mathbb{R}^d$ 1 is the irreducible (oracle) risk for optimal interpolant (Chalvidal et al., 5 Jun 2026).

For Riemannian manifolds, the continuity and Fokker–Planck PDEs are shown to govern the flow of marginals, with sampling on the manifold efficiently realized via embedding SDEs, leveraging ambient Euclidean methods but projecting to the tangent bundle at each step (Wu et al., 22 Apr 2025).

Kernelized stochastic interpolants replace neural drifts by feature-based regressions, rendering the entire generative process training-free and linear in the feature dimension, and supporting pathwise KL-divergence control via optimal diffusion scheduling (Coeurdoux et al., 23 Feb 2026).

5. Applications: Generative Modeling, Forecasting, and Beyond

Stochastic interpolant training serves as a universal backbone for both conditional and unconditional generative modeling:

Few-Step Generation: BézierFlow demonstrates substantial improvement in sample quality for diffusion and flow models restricted to $\mathbb{R}^d$ 2 NFEs, reducing FID from 50.30 to 9.55 (NFE=4, CIFAR-10 EDM) compared to baseline schedulers and matching or surpassing distillation methods at a fraction of computational cost (Min et al., 15 Dec 2025).
Physical System Emulation: SI-based generative models outperform DDPMs and FNOs in deterministic error, spectral reconstruction, and probabilistic calibration (CRPS, SSR) on fluid PDEs and climate models, enabling 2–5 step accurate forecasting and ensemble uncertainty quantification (Zhou et al., 30 Sep 2025).
Covariance Estimation: SI-based shrinkage methods surpass Ledoit–Wolf and Wasserstein-OT shrinkage in both theoretical and fMRI covariance estimation, with the bias-variance trade-off adjustable by scheduling, coupling, and early stopping (Chalvidal et al., 5 Jun 2026).
Latent Variable Models: Latent Stochastic Interpolants construct ELBOs directly in continuous time, admitting arbitrary priors and enabling end-to-end optimization of encoder, decoder, and latent bridge for computationally efficient and expressive image generation (Singh et al., 2 Jun 2025).
Operator and Multimarginal Interpolants: Frameworks accommodating vector, matrix, or operator time variables support multitask learning, inpainting, channel-adaptive denoising, posterior sampling, and structured transport on the simplex, with a single trained drift field (Negrel et al., 6 Aug 2025, Albergo et al., 2023).
Manifold Learning: The Riemannian Neural Geodesic Interpolant bridges densities on non-Euclidean spaces along geodesics, with rigorous PDE and SDE constructions, specialized neural approximators, and error quantifications (Wu et al., 22 Apr 2025).

6. Connections to Kernel Methods, Optimization, and Generalization

The SI training perspective reveals deep ties to kernel machines and the implicit geometry of learning:

Path-Kernel View: For neural networks trained with (stochastic) gradient descent, expected outputs arise as dynamic kernel machines, with test predictions aggregating stored tangent feature memories. Generalization is characterized by the RKHS and the null-space of the path kernel (Guo et al., 14 Mar 2026).
Kernelized SI: Training-free SI methods estimate the drift as a linear combination of feature gradients, yielding generative models amenable to ensembling, cross-domain transfer, and linear algebra solvers for high-dimensional applications, subject to KL risk bounds informed by the choice of feature map and diffusion schedule (Coeurdoux et al., 23 Feb 2026).
Bias–Variance and Scheduling: Regularization can be decomposed into schedule design, choice of coupling/correlation between endpoints, and early stopping in drift training—each enabling explicit control over model bias and variance, and measurable impact on sample quality and statistical risk (Chalvidal et al., 5 Jun 2026).
Generalization and Extrapolation: The SI path-kernel structure determines which directions generalize; test points whose tangent features are orthogonal to all training arcs remain unpredictable, establishing sharp conditions for model extrapolation (Guo et al., 14 Mar 2026).

7. Limitations, Best Practices, and Open Directions

A number of pragmatic and conceptual factors govern the effective deployment of stochastic interpolant training:

Small training sets induce overfitting or memorization in both deterministic and stochastic SI models; achieving true sample diversity generally requires large datasets or well-calibrated noise schedules (Li et al., 26 Sep 2025).
For practical robustness, the selection of noise schedules $\mathbb{R}^d$ 3, schedule parameterizations (Bézier, polynomials, operator paths), and coupling structures (independent, OT, neural) must be tuned to the modality of interest.
Zero-shot adaptation, conditional generation, and sequential inference are facilitated by operator-based or multimarginal SI frameworks capable of handling arbitrary interpolation paths post hoc, providing task-agnostic training and broad application domains (Negrel et al., 6 Aug 2025, Albergo et al., 2023).

Open research avenues include further extensions to non-Euclidean and multimodal data, scalable kernelized variants tailored to extreme dimensions, and analysis of generalization limits in high-complexity or data-poor regimes.

Stochastic interpolant training thus embodies a unifying abstraction for continuous, sample-efficient, and expressive probabilistic modeling, rooted in quadratic regression, dynamical transport, and universalizing architectures for learning to efficiently bridge between complex distributions (Albergo et al., 2023, Negrel et al., 6 Aug 2025, Min et al., 15 Dec 2025, Guo et al., 14 Mar 2026, Wu et al., 22 Apr 2025, Chalvidal et al., 5 Jun 2026, Singh et al., 2 Jun 2025, Zhou et al., 30 Sep 2025, Chen et al., 2024, Coeurdoux et al., 23 Feb 2026, Ma et al., 2024, Li et al., 26 Sep 2025, Horowitz et al., 22 Oct 2025).