Stochastic Interpolant (SI) Schedulers

Updated 22 December 2025

Stochastic Interpolant (SI) schedulers are formal mechanisms that define smooth, time-indexed trajectories between tractable prior distributions and complex target distributions using both deterministic and stochastic integration.
They optimize numerical performance by minimizing the drift's Lipschitz constant, which enables larger time steps and reduces discretization error, crucial for high-dimensional and non-Gaussian settings.
Adaptive parameterizations, including Bézier curves and operator-valued extensions, allow flexible scheduler design, efficient sampler transfer, and improved performance in multi-modal and conditional generative tasks.

A Stochastic Interpolant (SI) scheduler is a formal mechanism for designing time-indexed interpolation paths between probability distributions—typically from a tractable prior, such as a standard Gaussian, to a complex target—serving as the backbone for modern flow-based, diffusion, and generative models. The SI framework specifies how to couple source and target distributions via smooth interpolation coefficients that control both deterministic transport and stochastic blurring, yielding a unifying generalization of the sampling trajectories used in diffusion and flow models. The precise choice of interpolation “schedule” has profound implications for numerical stability, convergence rate, and sample quality, especially in high-dimensional and non-Gaussian settings.

1. Mathematical Formulation of Stochastic Interpolant Schedulers

Consider two probability measures $\rho_0,\rho_1$ on $\mathbb{R}^d$ and a coupling $\nu(dx_0, dx_1)$ between them. Let $z\sim \mathcal{N}(0,I_d)$ be an independent standard Gaussian. An SI scheduler defines a smooth interpolation (in the scalar case for standard generative modeling applications):

$x_t = \alpha(t)x_1 + \beta(t)z, \quad t \in [0,1]$

with boundary conditions $\alpha(0)=0$ , $\beta(0)=1$ , $\alpha(1)=1$ , $\beta(1)=0$ and the derivative $(\dot{\alpha}(t), \dot{\beta}(t)) \in C^1([0,1])$ . The time evolution of the marginal law $\rho_t$ is induced either by deterministic integration (probability-flow ODE) or stochastic integration (SDE) with the drift

$b(x,t;\alpha) = \mathbb{E}[\dot{x}_t \,|\, x_t=x]$

and, possibly, diffusion coefficient $\sigma(t)$ . The SI scheduler is the parameterization $(\alpha,\beta)$ (or their vector/operator-valued analogs) that prescribes the full trajectory from prior to target distributions (Chen et al., 1 Sep 2025, Negrel et al., 6 Aug 2025, George et al., 1 Feb 2025).

2. Statistical Equivalence vs. Numerical Performance

A core theoretical insight is that, when tuning the SDE diffusion variance optimally for a given $(\alpha, \beta)$ schedule, all scalar SI schedulers achieve identical statistical efficiency: the minimal path-space Kullback-Leibler divergence (KL) between the true dynamics and its numerical approximation depends only on the variance of score estimation error aggregated along the scalar interpolation ratio, not on the path shape. Explicitly, after optimal $\sigma^*(t)$ tuning,

$\int_0^1 \mathbb{E}_{x_t \sim \rho_t} \left[ \|v(x_t,t;\alpha)-v_0(x_t,t)\|^2 / 2\sigma^2(t) \right] dt$

achieves the same minimized KL for all scalar $(\alpha, \beta)$ . This statistical invariance does not, however, translate to equal numerical efficiency: the regularity of the drift field $b(x,t;\alpha)$ and its Lipschitz constant vary dramatically with $(\alpha,\beta)$ , directly affecting time-step constraints and discretization error for ODE/SDE solvers (Chen et al., 1 Sep 2025).

3. Lipschitz-Guided Schedule Design

Empirical and theoretical results indicate that minimizing the time-averaged squared Lipschitz constant of the drift field,

$J[\alpha]=\int_{0}^{1} \mathbb{E}_{x \sim \rho_t}\|\nabla_x b(x,t;\alpha)\|^2 \,dt$

directly improves the feasibility of large numerical time steps: smaller $J[\alpha]$ enables fewer steps for a fixed error tolerance in explicit integrators. For Gaussian targets, the optimal choice of $(\alpha, \beta)$ arises from minimizing $J[\alpha]$ , yielding, for $x_1\sim \mathcal{N}(0,M)$ :

$\beta(t)=\sqrt{\frac{M^t-1}{M-1}},\quad \alpha(t)=\sqrt{\frac{M-M^t}{M-1}}$

which exponentially improves drift regularity over linear schedules ( $J \sim \frac{1}{4}\log^2 M$ vs. $J \sim \sqrt{M}$ for linear interpolation). For Gaussian mixtures, tailored schedules mitigate mode collapse by moderating the initial drift amplitude, enabling both modes to be captured with coarse integration and few steps (Chen et al., 1 Sep 2025).

4. Practical Construction and Nonuniform Discretization

Accurate and efficient sampling demands careful alignment of time-step allocation with local properties of the SI schedule. For both ODE and SDE integrators, the leading discretization error term is amplified by the inverse powers of the latent “blurring scale” $\gamma(t)$ emanating from the SI interpolation. Minimizing global error requires non-uniform step allocation proportional to $\gamma_k^2$ (where $\gamma_k = \inf_{t\in [t_k,t_{k+1}] }\gamma(t)$ ). Thus, the SI scheduler construction for $N$ steps takes the form:

For $\gamma^2(t) = a t (1-t)$ (“linear” case), set $h_k \propto \gamma_k^2$ , building an exponentially-decaying grid centered at $t=1/2$ .
For variance-preserving SI (e.g., $\gamma(t)=\sqrt{1-t^2}$ ), a geometric grid in $t$ is optimal.

These schedules are derived by balancing higher-order error amplification near endpoints, resulting in step allocations that uniformize the leading local error terms (Liu et al., 13 Feb 2025, Liu et al., 10 Aug 2025). Theoretical complexity gains are substantiated by experiments: on low- and high-dimensional targets, nonuniform SI schedulers outperform uniform time grids by an order of magnitude in convergence per step.

5. Bézier, Operator, and Adaptive Parameterizations

Parameterization of the scheduler trajectory is crucial for both expressiveness and implementation. BézierFlow (Min et al., 15 Dec 2025) proposes SI schedulers built from degree- $n$ Bézier curves in $\alpha$ and $\sigma$ ; control point monotonicity and endpoint constraints enforce key SI desiderata (C $^2$ smoothness, monotonic signal-to-noise ratio). This approach expands the effective search space from discrete time-point selection to continuous polynomial curves, resulting in trajectories that retain differentiability and generalize to unseen step counts, datasets, or solver schemes. In operator-valued extensions (Negrel et al., 6 Aug 2025), SI schedulers can bridge arbitrary (possibly structured) spaces, enabling inpainting, conditional generation, multiscale modeling, and other zero-shot transfer tasks.

Adaptive compute scaling via SI schedulers is exemplified by DA-SIP (Chun et al., 25 Nov 2025), which integrates SI policy generation with runtime difficulty-classification to dynamically select time-horizon, solver order, and SDE/ODE integration per task phase, achieving significant empirical compute reductions.

6. Schedule Transfer and Practical Sampler Construction

A notable property of SI scheduler formalism is the existence of explicit algebraic “transfer formulas” that map a trained drift function for one (“canonical”) schedule to any other, eliminating retraining overhead. Specifically, for a reference linear schedule $b^\dagger(x,t)$ and arbitrary $(\alpha,\beta)$ ,

$b(x,t;\alpha) = \frac{\dot{\alpha}(t)}{\alpha(t)}x + \left(\dot{\beta}(t)-\frac{\dot{\alpha}(t)\beta(t)}{\alpha(t)}\right)[(1-t^\dagger) b^\dagger(x^\dagger,t^\dagger) + x^\dagger]$

with $t^\dagger=1/(1+\alpha(t)/\beta(t))$ , $x^\dagger = x/\beta(t)$ (Chen et al., 1 Sep 2025). For discrete-time samplers, the SI scheduler is constructed as an explicit time grid, whose steps are set by closed-form rules derived from balancing endpoint error amplification with global cost constraints (Liu et al., 13 Feb 2025, Liu et al., 10 Aug 2025).

7. Empirical Validation and Application Domains

Empirical evaluations demonstrate the practical leverage of SI scheduler design. In high-dimensional PDE-constrained sampling (stochastic Allen–Cahn, 2D stochastic Navier–Stokes), Lipschitz-optimized SI schedules reduce the step count by factors of $10\times$ – $100\times$ , enabling accurate spectral recovery across increasing resolutions with minimal additional cost (Chen et al., 1 Sep 2025). In generative modeling, BézierFlow achieves $2$– $3\times$ speedups in few-step image synthesis tasks and maintains SOTA FID at $NFE \le 10$ (Min et al., 15 Dec 2025). In multitask, operator-valued SI settings, zero-shot conditional and multiscale generation is enabled by reparameterization of the SI scheduler after training (Negrel et al., 6 Aug 2025). Across all experiments, schedule shape—rather than statistical path-level optimality—controls practical performance.

In summary, SI schedulers constitute a mathematically rigorous and practically decisive lever in generative modeling, controlling the interpolation trajectory between probability distributions. Their design—guided by drift Lipschitz constant minimization, nonuniform step allocation, and flexible parameterizations—directly determines numerical tractability, multi-modal expressivity, and sample quality, particularly in high-dimensional and stiff or non-Gaussian regimes (Chen et al., 1 Sep 2025, Min et al., 15 Dec 2025, Liu et al., 13 Feb 2025, Liu et al., 10 Aug 2025, Negrel et al., 6 Aug 2025, Chun et al., 25 Nov 2025, George et al., 1 Feb 2025).