Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variance-Exploding SDEs: Theory & Applications

Updated 4 January 2026
  • Variance-Exploding SDEs are defined as drift-free stochastic processes with a time-dependent diffusion coefficient, forming the basis of generative diffusion models.
  • The ER-SDE framework employs a parameterized noise schedule via φ(σ), allowing interpolation between deterministic ODE solvers and fully stochastic SDE solvers.
  • Efficient VE ER-SDE solvers strike a balance between rapid sampling and high sample diversity, demonstrated by state-of-the-art empirical performance on benchmarks.

Variance-Exploding Stochastic Differential Equations (VE-SDEs) define a class of stochastic processes characterized by increasing variance over time and form a foundational component in generative diffusion models. In VE-SDEs, the forward stochastic differential equation is drift-free with a time-dependent diffusion coefficient, and the backward or reverse-time equation involves a score-based drift. Under the Extended Reverse-Time SDE (ER-SDE) framework, VE-SDEs admit semi-linear solutions with a parameterized noise-schedule, allowing interpolation between fully stochastic and deterministic (ODE) solvers. Approximate closed-form solutions, efficient solvers, and error analyses within this framework provide mathematical and practical insights into the speed, quality, and diversity of diffusion-based sampling algorithms (Cui et al., 2023).

1. Formal Definition and Structure of VE-SDEs

The Variance-Exploding SDE is defined by a forward equation of the form: dxt=0xtdt+dσt2dtdwt,x0p0(x0)\mathrm{d}x_t = 0\cdot x_t\,\mathrm{d}t + \sqrt{\frac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}\,\mathrm{d}w_t, \qquad x_0 \sim p_0(x_0) where the drift f(t,x)=0f(t, x) = 0, and the diffusion coefficient is g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}. The process is initialized from a data distribution p0(x0)p_0(x_0), and wtw_t is the standard Wiener process.

For generative modeling tasks, the reverse-time SDE (specialized from Song et al. 2021 to the VE case) is given by: dxt=[g(t)2xlogpt(xt)]dt+g(t)dwˉt\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t with reverse drift frev(t,x)=dσt2dtxlogpt(x)f_{\text{rev}}(t, x) = -\frac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t} \nabla_x \log p_t(x) and wˉt\bar{w}_t an independent Wiener process.

2. Solution Structure in the Extended Reverse-Time SDE Framework

The ER-SDE framework generalizes both SDE and ODE-based solvers by introducing an independent “reverse” noise scale h(t)h(t) and substituting the learned data-prediction network xθ(xt,t)x_\theta(x_t, t) for the score term.

Switching to the noise-level parameter f(t,x)=0f(t, x) = 00, the ER-SDE reads: f(t,x)=0f(t, x) = 01 with f(t,x)=0f(t, x) = 02, such that f(t,x)=0f(t, x) = 03. Define

f(t,x)=0f(t, x) = 04

with f(t,x)=0f(t, x) = 05.

The exact solution for evolving f(t,x)=0f(t, x) = 06 from f(t,x)=0f(t, x) = 07 to f(t,x)=0f(t, x) = 08 (Proposition 1) is: f(t,x)=0f(t, x) = 09 where g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}0. In practice, the nonlinear integral is approximated using Taylor expansion.

3. Efficient VE ER-SDE Solvers: Algorithmic Construction

The first-order VE ER-SDE-Solver has the following update for each step: g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}1 where g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}2, and g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}3. Each update utilizes a single evaluation of g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}4.

Pseudocode for the algorithmic workflow (as described in Algorithm 1) is:

Step Description Details
1 Initialization g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}5 (initial noise sample)
2 For g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}6 g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}7
g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}8
g(t)=dσt2dtg(t) = \sqrt{\tfrac{\mathrm{d}\,\sigma_t^2}{\mathrm{d}t}}9
drift p0(x0)p_0(x_0)0
noise p0(x0)p_0(x_0)1
p0(x0)p_0(x_0)2 drift p0(x0)p_0(x_0)3 noise
3 Output p0(x0)p_0(x_0)4

The method is tunable via the choice of p0(x0)p_0(x_0)5, the number of steps p0(x0)p_0(x_0)6, and the schedule p0(x0)p_0(x_0)7. Higher-order versions reuse evaluations and add finite-difference corrections at the cost of additional network calls per step.

4. Local Discretization Error and Theoretical Analysis

Discretization error is governed by the First-Order Euler Integral (FEI) coefficient: p0(x0)p_0(x_0)8 which quantifies the dominant local one-step error: p0(x0)p_0(x_0)9 The minimum FEI is achieved by choosing wtw_t0, corresponding to the deterministic probability-flow ODE, resulting in the lowest discretization error among the ER-SDE family. Any larger wtw_t1 increases FEI and the corresponding global error for equal step-size.

A change of variable demonstrates that VE and VP ER-SDEs share the same FEI coefficient, establishing parity between these formulations for a given pretrained model and fixed number of function evaluations (NFE).

5. Stochasticity, Sample Quality, and Diversity

ODE-based samplers with wtw_t2 have minimal local error but no injected noise, leading to less sample diversity. Choosing wtw_t3 close to wtw_t4 but sufficiently large to inject controlled noise allows ER-SDE-based VE solvers to achieve near-ODE fidelity without sacrificing diversity. This stochasticity-efficiency tradeoff is central: VE-SDE solvers interpolate between pure SDE and ODE processes, balancing rapid low-NFE sampling with high sample quality and output variability.

6. Practical Considerations and Empirical Findings

VE ER-SDE-Solvers are parameterized by the noise-scale function wtw_t5, number of steps wtw_t6, and schedule mapping wtw_t7. First-order solvers require just one network evaluation per step. Advanced higher-order variants increase per-step cost for potentially improved empirical accuracy.

Empirical evaluation on the ImageNet wtw_t8 benchmark demonstrates that ER-SDE-Solvers attain state-of-the-art performance across stochastic samplers while maintaining the efficiency typical of deterministic samplers (e.g., wtw_t9 FID in dxt=[g(t)2xlogpt(xt)]dt+g(t)dwˉt\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t0 function evaluations) (Cui et al., 2023). This suggests that appropriate tuning of dxt=[g(t)2xlogpt(xt)]dt+g(t)dwˉt\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t1 enables simultaneous optimization of sample quality and computational efficiency.

7. Significance and Theoretical Summary

The ER-SDE framework unifies ODE and SDE sampling methodologies for VE-SDEs, providing a family of semi-linear solutions whose error and stochasticity are parametrically controlled by dxt=[g(t)2xlogpt(xt)]dt+g(t)dwˉt\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t2. The key theorem asserts that, among all extended reverse-time SDEs with a given drift-score model, the ODE (dxt=[g(t)2xlogpt(xt)]dt+g(t)dwˉt\mathrm{d}x_t = \left[-g(t)^2 \nabla_x \log p_t(x_t)\right] \mathrm{d}t + g(t) \,\mathrm{d}\bar{w}_t3) uniquely minimizes the local discretization error. A plausible implication is that careful functional choice allows constructing samplers that closely approach ODE performance while preserving the stochastic effects essential for output variability and model robustness (Cui et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variance-Exploding Stochastic Differential Equations (VE-SDEs).