Variance Exploding Diffusion Samplers

Updated 4 July 2026

Variance exploding diffusion samplers are reverse-time generative procedures that use Gaussian forward processes with increasing variance to maintain signal while increasing noise.
They combine stochastic reverse SDEs and deterministic probability-flow ODEs, integrating score estimation, covariance adjustments, and schedule design to enhance sampler performance.
Recent advancements focus on covariance-aware sampling, learned initialization, and discrete-time reformulations that optimize performance in compute-constrained, few-step regimes.

Searching arXiv for papers on variance-exploding diffusion samplers and closely related sampling methods. Variance-exploding diffusion samplers are reverse-time generative procedures associated with Gaussian forward processes whose variance increases with time. In a standard VE formulation, the forward dynamics may be written as $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t$ , with discrete transitions $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ ; the corresponding reverse process can be sampled either as a stochastic reverse-time SDE or via the deterministic probability-flow ODE (Zhang et al., 27 May 2025). Recent work treats VE sampling as more than a fixed numerical integrator: stochasticity control, covariance structure, schedule design, initialization, and discrete-time policy learning all materially affect few-step and compute-constrained performance (Sheng et al., 12 Oct 2025, Kahouli et al., 12 Feb 2025, Fassina et al., 28 Feb 2026).

1. Formalism of VE sampling

The defining feature of the VE family is that the forward noising process leaves the signal mean unchanged while increasing the noise scale. In one canonical specification, the forward SDE is

$d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$

so the forward transition kernel on a discrete grid is

$p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$

Its reverse-time SDE is

$d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$

and the associated probability-flow ODE is

$d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$

These equations are the basic stochastic and deterministic VE samplers, respectively (Zhang et al., 27 May 2025).

A broader sampler family is obtained by introducing an explicit stochasticity parameter. In the generalized reverse process

$dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$

$\eta=1$ recovers the standard reverse SDE, $\eta=0$ yields the probability-flow ODE, and $\eta>1$ produces trajectories with even higher stochasticity while preserving marginals in the continuous-time exact-score limit. The corresponding gDDIM discretization gives a discrete sampler family interpolating continuously between ODE and SDE sampling (Sheng et al., 12 Oct 2025).

VE formulations are often contrasted with VP processes, but several recent works emphasize that the distinction is partly parameterization-dependent. Under the total-variance/signal-to-noise-ratio decomposition, a Gaussian perturbation kernel can be written in terms of

$p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 0

Classical VE corresponds to a schedule with fixed $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 1, decreasing $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 2, and typically strongly increasing $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 3; classical VP corresponds to $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 4 with a different $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 5 schedule (Kahouli et al., 12 Feb 2025). This reframing has made schedule design central to modern VE sampler analysis.

2. Geometric and score-theoretic structure

For the EDM-style VE SDE

$p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 6

the probability-flow ODE can be written directly in terms of the denoiser as

$p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 7

A geometric analysis shows that the ODE sampling trajectory is quasi-linear in data space: the path from the initial noise sample to the final generated sample deviates only weakly from the chord connecting the endpoints. The same analysis defines the denoising trajectory $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 8, which converges perceptually faster than the sampling trajectory and whose derivative satisfies

$p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I})$ 9

This establishes that the denoising trajectory governs the curvature of the sampling trajectory, and finite-difference approximations to this derivative recover practical second-order samplers such as Heun, DPM-Solver-2, S-PNDM, and DEIS (Chen et al., 2023).

The same geometric study also connects optimal ODE sampling to Gaussian mean-shift. Under the empirical Bayes-optimal denoiser, one Euler step of the VE probability-flow ODE becomes a convex combination of the current point and the annealed mean-shift mean. In that limit, the sampler is mode-seeking; empirically observed deviations of learned scores from the optimal score are interpreted as the mechanism that prevents trivial nearest-neighbor replay and preserves generative novelty (Chen et al., 2023).

A complementary score-theoretic development is the Conditional Score Expectation identity. For affine diffusions, including VE SDEs, the marginal score at time $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 0 can be written as a conditional expectation of the initial score field under the forward posterior. In the VE case,

$d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 1

This yields a nonparametric score estimator based on self-normalized importance sampling over reference samples $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 2, using weights proportional to the forward Gaussian kernel $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 3. The resulting CSE estimator is most stable at small time, where standard Tweedie-type estimators become ill-conditioned (Duston et al., 4 Jan 2026).

3. Variance, covariance, and schedule design

VE samplers have traditionally injected isotropic noise according to the forward schedule, but recent work argues that this neglects the structure of the reverse conditional. A covariance-aware sampler shows that, for Gaussian forward kernels, Tweedie-style identities imply that the reverse-process covariance is encoded in the Jacobian of the predicted reverse mean. Although the method is written in DDPM/DDIM VP language and evaluated on pixel-space VP models, the paper states that its derivations require only a Gaussian forward kernel and therefore apply equally to VE schedules. The practical proposal is to estimate structured reverse covariance with one extra Jacobian–Vector Product per step and inject noise in a Fourier-domain basis; for pixel-space models this consistently outperforms Heun, DPM-Solver++, and aDDIM at equal NFE in the few-step regime (Schioppa et al., 13 May 2026).

Variance tuning in the strict VE setting is developed explicitly in Variance-Tuned Diffusion Importance Sampling. Starting from a pretrained VE score-based diffusion model with reverse kernel variance $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 4, the method replaces the isotropic covariance by a learned $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 5, which may be isotropic, diagonal, or low-rank plus isotropic. The tuning criterion is the $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 6-divergence with $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 7 between forward VE trajectories and reverse denoising trajectories; minimizing this objective directly minimizes importance-weight variance and maximizes effective sample size. The abstract reports effective sample sizes of approximately $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 8, $d\mathbf{x}_t=\sqrt{2t}\,d\mathbf{w}_t,$ 9, and $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 0 on DW-4, LJ-13, and alanine-dipeptide, respectively, using only a fraction of the computational budget required by vanilla diffusion + IS or PF-ODE-based IS (Zhang et al., 27 May 2025).

A different line of work argues that the main pathology of many VE schedules is not the SNR profile but the explosion of total variance. Under the TV/SNR decomposition, schedules such as SMLD and EDM-UT can be converted into constant-TV variants that preserve the original SNR. The paper reports that schedules where the TV explodes exponentially can often be improved by adopting a constant TV schedule while preserving the same SNR schedule, and further proposes the VP-ISSNR family, which combines $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 1 with an inverse-sigmoid-style SNR curve related to optimal transport flow matching. The reported findings hold across various reverse diffusion solvers and across molecular structure and image generation (Kahouli et al., 12 Feb 2025).

4. Fast samplers, discrete-time reformulations, and learned policies

Fast VE sampling is increasingly framed as a discrete-time design problem rather than a faithful approximation of a fixed continuous-time SDE. The DDSS framework optimizes sampler parameters directly by differentiating through sample-quality scores. Its Generalized Gaussian Diffusion Models define flexible non-Markovian Gaussian samplers whose means depend linearly on multiple states and whose variances are learnable. Although developed for DDPM-style models rather than explicit VE SDEs, the paper argues that the same machinery applies to any Gaussian diffusion sampler whose variance grows along the trajectory. On LSUN church $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 2, DDSS reports FID $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 3 with only $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 4 inference steps and $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 5 with $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 6 steps, compared to $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 7 and $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 8 with the strongest DDPM/DDIM baselines (Watson et al., 2022).

A more radical discrete-time view appears in adaptive destruction-process methods. Here, diffusion samplers are treated as finite-horizon Markov decision processes, and the generation and destruction kernels are learned as unconstrained Gaussian densities with decoupled variances. The paper explicitly argues that continuous-time VE/VP constraints, especially the tying of diffusion coefficients, are counterproductive when the number of steps is small. In the few-step regime, jointly learning both generation and destruction processes yields faster convergence and improved sampling quality, and a robust ablation study identifies design choices needed for stable training (Gritsaev et al., 2 Jun 2025).

Initialization has also become a first-class design variable. A KL decomposition for VE samplers shows that the final error contains an additive initialization term $p(\mathbf{x}_n\mid \mathbf{x}_{n-1})=\mathcal{N}\!\bigl(\mathbf{x}_n;\mathbf{x}_{n-1},(t_n^2-t_{n-1}^2)\mathbf{I}\bigr).$ 9, making the backward prior a direct contributor to sample quality. This motivates learning a flow-based approximation to the forward marginal $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 0 at a moderate noise level and then running a short-horizon reverse VE sampler from that learned prior. On FFHQ-64, the paper reports that a baseline EDM-style configuration with Gaussian initialization at $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 1 and $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 2 steps gives FID $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 3, while short-horizon empirical $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 4 initialization at $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 5 with $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 6 steps gives FID $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 7 (Fassina et al., 28 Feb 2026).

Loss functions become contentious once one leaves the fixed-forward regime. For diffusion bridges, including VE-like systems with learned diffusion coefficients or learned forward processes, the on-policy log-variance loss no longer matches reverse-KL gradients. The paper argues that LV then ceases to represent an objective justified by the data processing inequality, whereas reverse KL with the log-derivative trick remains well motivated and empirically more stable. It also reports that learning $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 8 with LV often diverges, while rKL-LD consistently improves performance (Sanokowski et al., 12 Jun 2025).

5. Representative VE systems and empirical behavior

Several recent systems illustrate how VE sampler design changes across application domains (Wang et al., 2024, Zhang et al., 11 Nov 2025, Zhang et al., 27 May 2025, Sheng et al., 12 Oct 2025).

System	VE-related mechanism	Reported outcome
uDDDM	Unified directly denoising for VE and VP	CIFAR-10 one-step FID $d\mathbf{x}_t=-2t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt+\sqrt{2t}\,d\bar{\mathbf{w}}_t,$ 9 for VE; $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 0-step FID $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 1 for VE
VT-DIS	Post-training reverse covariance tuning in a VE SBDM	ESS approximately $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 2, $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 3, and $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 4 on DW-4, LJ-13, and alanine-dipeptide
VEDA	VE diffusion with annealing, preconditioning, and arcsin scheduler	$d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 5 sampling steps; median relaxation energy $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 6 kcal/mol vs $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 7 kcal/mol for SemlaFlow
RLHF gDDIM	Training with stochastic samplers, inference with ODE samplers	Reward gaps consistently narrow over training

The unified directly denoising framework extends one-step and multistep direct-denoising models to both VP and VE settings. It learns a global PF-ODE flow map rather than integrating the score field explicitly, proves existence and uniqueness of the learned solution paths and a non-intersecting property of sampling paths, and reports one-step CIFAR-10 generation with FID $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 8 for VE and $d\mathbf{x}_t=-t\,\nabla_{\mathbf{x}}\log p_t(\mathbf{x}_t)\,dt.$ 9 for VP, improving to $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 0 and $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 1 at $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 2 steps (Wang et al., 2024). This places VE in the same one-step/fixed-point iteration regime as consistency-style samplers.

In molecular generation, VEDA integrates a VE coordinate diffusion with annealing, a preconditioning scheme adapted to SE(3)-equivariant coordinate networks, and an arcsin scheduler that concentrates sampling near critical log-SNR intervals. On QM9 and GEOM-DRUGS, it reports state-of-the-art valency stability and validity with only $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 3 sampling steps, and on GEOM-DRUGS the median relaxation energy is $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 4 kcal/mol, compared with $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 5 kcal/mol for the architectural baseline SemlaFlow (Zhang et al., 11 Nov 2025). In this case, VE is explicitly interpreted as functionally analogous to simulated annealing.

VE sampler stochasticity also plays a central role in RLHF. Theoretical analysis of Gaussian VE models yields a reward-gap bound

$dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 6

showing that the discrepancy between training with a stochastic VE SDE sampler and inference with a deterministic ODE sampler decays as $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 7. Large-scale text-to-image experiments further report that reward gaps consistently narrow over training and that ODE sampling quality improves when models are updated using higher-stochasticity SDE training (Sheng et al., 12 Oct 2025). This makes VE-style stochasticity a useful exploration device even when final deployment is deterministic.

6. Limitations, misconceptions, and open directions

A recurring misconception is that “variance exploding” should be interpreted as a prescription to let every measure of variance grow. The TV/SNR analysis argues against that reading: VE’s useful ingredient may be its SNR schedule rather than its exploding total variance, and schedules with constant TV while preserving the same SNR can outperform classical VE baselines in low-NFE regimes (Kahouli et al., 12 Feb 2025). This suggests that VE should be understood as a family of Gaussian-forward parameterizations rather than a single optimal schedule template.

Another common assumption is that adding more stochasticity or more reverse noise is uniformly beneficial. Multiple papers qualify this. Covariance-aware reverse noise helps pixel-space few-step samplers, but the same paper reports a negative result for latent diffusion, where adding latent-space noise—even covariance-aware noise—hurts performance and deterministic DDIM is best (Schioppa et al., 13 May 2026). A related latent-diffusion study attributes such fragility to overly compact latent manifolds produced by $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 8-VAE tokenizers and proposes a Variance Expansion loss to make the latent space robust to sampling perturbations; that work explicitly states that it does not propose a new variance-exploding sampler or schedule at the diffusion level (Li et al., 22 Mar 2026).

Score estimation itself remains a bottleneck for VE-like samplers on unnormalized targets. Diffusion path samplers via sequential Monte Carlo develop control-variate schedules that minimize the variance of score estimates along a diffusion path and provide convergence guarantees. Although this framework is not written as a classical VE score model, it offers a principled alternative to neural score fitting and shows how path design, Monte Carlo variance, and score accuracy can be analyzed jointly (Young et al., 29 Jan 2026). A plausible implication is that future VE samplers may combine learned score fields with pathwise control variates, covariance estimators, and adaptive proposals rather than relying on a single monolithic network.

Across the literature, three open directions recur. The first is reverse-covariance modeling: current VE samplers usually inject isotropic noise, whereas reverse-conditionals are broad and structured in few-step regimes (Schioppa et al., 13 May 2026). The second is initialization learning: replacing the large-noise Gaussian prior by a learned approximation to $dY_t=\Big(-f(T-t,Y_t)+\tfrac{1+\eta^2}{2}g^2(T-t)s_\theta(T-t,Y_t)\Big)\,dt+\eta\,g(T-t)\,dB_t,$ 9 shortens the reverse horizon and alters the trade-off between discretization error and model error (Fassina et al., 28 Feb 2026). The third is objective design in flexible samplers: once forward processes, destruction processes, or diffusion coefficients are learned, reverse-KL-style objectives appear more stable and better motivated than variance-only surrogates (Sanokowski et al., 12 Jun 2025). Taken together, these developments indicate that VE diffusion samplers are evolving from fixed reverse integrators into a broader class of stochastic transport algorithms whose performance depends on the coordinated design of schedules, covariance, initialization, and estimation.