Papers
Topics
Authors
Recent
Search
2000 character limit reached

Point-Mass Schedule in Generative Modeling

Updated 10 February 2026
  • Point-mass schedules are interpolation methods that initiate from a point mass, redefining the coupling between noise and data in generative models.
  • They modify traditional SDE/ODE dynamics by enforcing zero initial drift and leveraging statistically optimal diffusion, resulting in enhanced numerical stability.
  • Retrofitting pretrained flow and diffusion models with these schedules reduces integration steps while maintaining high sample quality.

A point-mass schedule is a class of interpolation schedules in generative modeling via stochastic interpolants, where the conventional Gaussian base measure collapses to a point mass at the initial time, fundamentally altering the induced SDE/ODE dynamics and allowing for numerically efficient sampling. This schedule generalizes the way base and target distributions are interpolated, leading to improved sampling complexity and convergence properties, particularly under statistically optimal SDE sampling. Point-mass schedules enable retrofitting of pretrained flow and diffusion models to yield high-quality generative samples in fewer integration steps, often without retraining (Damsholt et al., 3 Feb 2026).

1. Formal Definition and Construction

In the stochastic interpolant framework, the coupling of a standard Gaussian ZN(0,I)Z\sim N(0,I) and target data XρXX\sim \rho_X is achieved via a spatially linear interpolant It=αtZ+βtXI_t = \alpha_t Z + \beta_t X, t[0,1]t\in[0,1], where the schedule (α,β)(\alpha, \beta) traditionally satisfies α0=1\alpha_0 = 1, β0=0\beta_0=0, α1=0\alpha_1=0, β1=1\beta_1=1, and monotonicity conditions α˙t<0,β˙t>0\dot\alpha_t < 0, \dot\beta_t > 0. The point-mass schedule is characterized by relaxing the initial boundary: both α0=0\alpha_0=0 and β0=0\beta_0=0. More precisely ((Damsholt et al., 3 Feb 2026), Definition 4.2):

  • α,βC1([0,1])\alpha, \beta \in C^1([0,1]) with α0=α1=β0=0\alpha_0=\alpha_1=\beta_0=0, β1=1\beta_1=1,
  • αt>0\alpha_t>0, β˙t>0\dot\beta_t>0 for t(0,1)t\in(0,1),
  • βt=o(αt)\beta_t=o(\alpha_t) as t0+t\to 0^+, ensuring I0=0I_0=0 almost surely,
  • αt2=O(βt)\alpha_t^2=O(\beta_t) as t0+t\to 0^+ (regularity),
  • u0limt0+βtαt+βt=0u_0 \equiv \lim_{t\to 0^+} \frac{\beta_t}{\alpha_t+\beta_t} = 0 (finite derivative),
  • ddt(βt/αt)>0\frac{d}{dt}(\beta_t/\alpha_t)>0 so tutt\mapsto u_t is strictly increasing.

These properties enforce initial sampling from a point mass, with interpolation governed by β=o(α)\beta = o(\alpha) so the initial evolution is dominated by drift.

2. SDE/ODE Induced by Schedule

For any schedule (α,β)(\alpha, \beta), the marginal law of ItI_t is ρ(t,x)=Law(It)\rho(t, x) = \mathrm{Law}(I_t). The following mean fields are defined:

  • ηZ(t,x)=E[ZIt=x]\eta_Z(t, x) = \mathbb{E}[Z|I_t = x],
  • ηX(t,x)=E[XIt=x]\eta_X(t, x) = \mathbb{E}[X|I_t = x],
  • s(t,x)=xlogρ(t,x)s(t, x) = \nabla_x \log \rho(t, x) (score).

For any C1C^1 non-negative diffusion-scale ϵt\epsilon_t, the driven drift is

bϵ(t,x)=α˙tηZ(t,x)+β˙tηX(t,x)+ϵts(t,x).b^\epsilon(t, x) = \dot\alpha_t \eta_Z(t, x) + \dot\beta_t \eta_X(t, x) + \epsilon_t s(t, x).

The unique strong solution to

dXtϵ=bϵ(t,Xtϵ)dt+2ϵtdWt,X0ϵN(0,I)dX_t^\epsilon = b^\epsilon(t, X_t^\epsilon)dt + \sqrt{2\epsilon_t} dW_t, \quad X_0^\epsilon \sim N(0,I)

satisfies Law(Xtϵ)=Law(It)\mathrm{Law}(X_t^\epsilon) = \mathrm{Law}(I_t) and X1ϵρXX_1^\epsilon \sim \rho_X. The deterministic case ϵ0\epsilon \equiv 0 gives the probability-flow ODE.

3. Statistical-Optimal Diffusion and Lazy Schedules

Closed-form identities ((Damsholt et al., 3 Feb 2026), Proposition 3.2) relate mean fields and drift:

  • ηZ(t,x)=αts(t,x)\eta_Z(t, x) = -\alpha_t s(t, x),
  • ηX(t,x)=(x+αt2s(t,x))/βt\eta_X(t, x) = (x + \alpha_t^2 s(t, x))/\beta_t,
  • bϵ(t,x)=(ϵt+ϵt)s(t,x)+(β˙t/βt)xb^\epsilon(t, x) = (\epsilon_t^* + \epsilon_t)s(t, x) + (\dot\beta_t/\beta_t)x, with the statistically-optimal diffusion scale

ϵt=αt2β˙tβtαtα˙t.\epsilon_t^* = \alpha_t^2\frac{\dot\beta_t}{\beta_t} - \alpha_t\dot\alpha_t.

Theorem 3.3 shows ϵ\epsilon^* uniquely minimizes the path-space KL divergence between the true SDE and its plug-in approximation.

For XN(0,I)X\sim N(0, I), the lazy schedule is defined by requiring the drift to vanish identically:

  • ODE lazy (ϵ0\epsilon \equiv 0): b(t,x)=0b(t,x)=0 iff αt2+βt2=1\alpha_t^2 + \beta_t^2 = 1 for all tt (variance-preserving schedule).
  • SDE lazy (ϵ=ϵ\epsilon = \epsilon^*): b(t,x)=0b^*(t,x) = 0 iff αt2+βt2=βt\alpha_t^2 + \beta_t^2 = \beta_t, which forces α0=0\alpha_0=0, β0=0\beta_0=0, i.e., a point-mass schedule.

It follows that 2ϵt=β˙t2\epsilon_t^* = \dot\beta_t, and thus 0t2ϵudu=βtβ0=βt\int_0^t 2\epsilon_u^* du = \beta_t - \beta_0 = \beta_t.

4. Path-Wise Conversion to Point-Mass Schedule

Theorem 4.6 provides a path transform allowing translation between arbitrary schedules and the point-mass schedule. Define ct=αt+βtc_t = \alpha_t + \beta_t, ut=βt/(αt+βt)u_t = \beta_t / (\alpha_t + \beta_t). Given an SDE under (α,β,ϵ)(\alpha, \beta, \epsilon), consider the linear schedule (αˉt=1t,βˉt=t)(\bar{\alpha}_t=1-t,\, \bar{\beta}_t=t) and set

  • ϵˉut=αtϵt/(βtϵt)\bar{\epsilon}_{u_t} = \alpha_t \epsilon_t / (\beta_t \epsilon_t^*),
  • rescale the Wiener process: Wˉu=0tuu˙sdWs\bar{W}_u = \int_0^{t_u} \sqrt{\dot{u}_s} dW_s.

The solution Xut\overline{X}_{u_t} to the rescaled linear schedule satisfies

Xtϵ=ctXutX_t^\epsilon = c_t\, \overline{X}_{u_t}

for the original schedule. Therefore, any pretrained flow or diffusion model under any schedule may be converted to a point-mass schedule via successive schedule transformations.

5. Practical Sampling Algorithms

For the "equal-logit-time" point-mass schedule ut=tu_t = t ((Damsholt et al., 3 Feb 2026), Example 5.5), explicit pseudocode is given for both ODE lazy and SDE lazy sampling, using a pretrained linear-flow velocity vflow(t,x)=bˉ(t,x)v^{flow}(t, x) = \bar{b}(t, x):

Lazy ODE (ϵ0\epsilon\equiv 0):

1
2
3
4
5
6
7
8
input: pretrained linear-flow velocity v̄(t,·), stepsize Δt
initialize t=0, xN(0,I)
loop n=0N1:
  d = (1t)^2 + t^2
  b = ((12t)/d)x + (1/d)·v̄(t,dx)
  x  x + Δt·b
  t  t+Δt
return x
Lazy SDE (ϵ=ϵ\epsilon = \epsilon^*):

1
2
3
4
5
6
7
8
9
input: pretrained linear-flow velocity v̄(t,·), stepsize Δt
initialize t=Δt, x = 0  (since point-mass initial X=0)
loop n=1N1:
  d = (1t)^2 + t^2
  b* = (2/d)[(12t)x + t·v̄(t,(d/t)x)]
  ΔW  N(0,(β_{t+Δt}β_t)·I),   where β_t=t^2/d
  x  x + Δtb* + ΔW
  t  t+Δt
return x

This construction demonstrates that point-mass schedule sampling can leverage existing pretrained models and simple update rules for efficient sampling.

6. Theoretical Properties and Numerical Considerations

Point-mass schedules under statistically optimal SDE sampling exhibit several key properties:

  • The initial state is a point mass, and dynamics have zero drift under the lazy SDE schedule.
  • For deterministic (ODE) lazy sampling, the variance-preserving condition α2+β2=1\alpha^2 + \beta^2=1 recovers the standard diffusion schedule.
  • The path-space KL divergence is invariant to the schedule (Proposition 4.9), so numerical stability, rather than statistical optimality, drives optimal schedule selection.
  • Bounded time-derivatives α˙,β˙\dot\alpha, \dot\beta improve numerical integration, especially near t=0,1t=0,1, reducing required step counts and stabilization overhead.

A systematic implication is that, for Gaussian data and SDE lazy schedules, optimal sampling starts from a point mass and proceeds with minimal stochastic "stiffness," contributing to practical acceleration.

7. Empirical Performance and Applications

Experiments using a 1.3B-parameter PRX latent flow model for text-to-image generation ((Damsholt et al., 3 Feb 2026), Section 7) demonstrate that point-mass schedules enable meaningful reductions in integration steps:

  • With ODE sampling, converting to the lazy-ODE schedule yields up to 3 fewer solver steps (of 64–172).
  • With statistically optimal SDE sampling, the reduction is more substantial: 128 lazy-SDE steps achieve parity with approximately 171 linear-SDE steps (roughly 25% fewer steps).
  • The predictor–corrector integrator is robust across a variety of prompts, with observed improvements in RMSE and convergence consistency.

This suggests that retrofitting existing generative models to point-mass schedules, especially under the lazy SDE construction, provides a principled avenue for accelerating sample generation with little or no loss in output quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Point Mass Schedule.