Generative Modeling via Drifting (GMD)

Updated 9 May 2026

The paper introduces a generative modeling framework that regresses a generator’s output toward a fixed-point defined by a kernel drift operator.
It leverages the theory of Wasserstein gradient flows to unify score-matching with optimal transport strategies and ensures strong fixed-point identifiability.
Empirical results demonstrate one-step generation achieving state-of-the-art results in high-resolution imaging, physics, molecular, and medical applications.

Generative Modeling via Drifting (GMD) is a framework for training generative models that matches distributions by directly regressing a generator’s outputs toward a fixed-point determined by a kernel drift operator. GMD forgoes the need for iterative inference (e.g., MCMC, diffusion), yielding a generator capable of producing approximate samples from the target distribution in a single forward pass. The method is grounded in the theory of Wasserstein gradient flows, can express both score-matching and optimal transport-based strategies, admits strong identifiability results, and has been instantiated in high-resolution image, physics, molecular, and conditional medical image domains.

1. Mathematical Formulation and Drift Operator

At its core, GMD seeks to train a generator $f_\theta: \mathbb{R}^k\to\mathbb{R}^d$ (e.g., a neural network) such that the pushforward $q_\theta = (f_\theta)_\#p_0$ approximates a target distribution $p$ (either empirical data or Boltzmann, $p(x) \propto \exp(-E(x))$ ) (Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026).

GMD introduces a vector field (the "drift"): $V_{p,q}(x) = V_p^+(x) - V_q^-(x)$ where, for a positive kernel $k(x,y)$ ,

$V_p^+(x) = \frac{\mathbb{E}_{y\sim p}\left[k(x, y) (y - x)\right]}{\mathbb{E}_{y\sim p}[k(x, y)]}$

and $V_q^-$ is defined analogously for $q$ . This operator can be interpreted as a transport direction: an attraction toward the barycenter of nearby data points and repulsion from the model samples, weighted by the kernel (Deng et al., 4 Feb 2026, Lai et al., 8 Mar 2026).

The generator is optimized to predict a single-step Euler move along this drift: $x_{i+1} = x_i + V_{p,q}(x_i)$ Training is performed via a stop-gradient regression loss: $q_\theta = (f_\theta)_\#p_0$ 0 where $q_\theta = (f_\theta)_\#p_0$ 1 denotes stop-gradient (Deng et al., 4 Feb 2026).

For Gaussian kernels, the drift simplifies via Tweedie’s identity to a difference of smoothed scores: $q_\theta = (f_\theta)_\#p_0$ 2 where $q_\theta = (f_\theta)_\#p_0$ 3 is the Gaussian-smoothed density (Turan et al., 10 Mar 2026, Lai et al., 8 Mar 2026).

In energy-based settings, the target-side smoothed score can be estimated via:

Monte Carlo (importance sampling in a local Gaussian ball),
Second-order (curvature-corrected) approximation involving the Hessian of $q_\theta = (f_\theta)_\#p_0$ 4 (Cao et al., 18 Mar 2026).

2. Theoretical Foundations and Gradient Flow Structure

GMD is mathematically equivalent to taking a fixed point of the Wasserstein gradient flow (WGF) of an energy functional $q_\theta = (f_\theta)_\#p_0$ 5: $q_\theta = (f_\theta)_\#p_0$ 6 Choosing $q_\theta = (f_\theta)_\#p_0$ 7 as the (possibly smoothed) Kullback–Leibler divergence recovers score-based models; using Sinkhorn divergence or MMD yields other classes of drift fields (Gretton et al., 6 May 2026, Cao et al., 11 Mar 2026).

The stop-gradient is essential: with it, GMD matches the Euler step of the JKO proximal scheme for the respective gradient flow, ensuring descent in $q_\theta = (f_\theta)_\#p_0$ 8. Without stop-gradient, field collapse can occur without approaching $q_\theta = (f_\theta)_\#p_0$ 9 (Turan et al., 10 Mar 2026).

The fixed-point characterization establishes that $p$ 0 for $p$ 1 implies $p$ 2 under suitable identifiability conditions, which hold for translation-invariant positive-definite kernels (e.g., Gaussian, Matérn $p$ 3) (Lee, 27 Apr 2026). Counterexamples establish that field norm vanishing alone does not guarantee weak convergence, but enforcing a lower-bound on an overlap scalar suffices.

3. Extensions, Kernel Choices, and Feature Spaces

GMD can be instantiated with a variety of kernels (Deng et al., 4 Feb 2026, Turan et al., 10 Mar 2026, Lai et al., 8 Mar 2026):

Gaussian kernel: exact equivalence to kernel-smoothed score matching, but with exponential high-frequency convergence bottleneck.
Laplace/Matérn: only polynomial spectral decay, empirically superior for high-resolution and high-dimensional regimes.
Feature-space kernels: by embedding data in a learned or pretrained feature space (e.g., ResNet/MAE latents), the barycenter and transport field can exploit semantic locality (Deng et al., 4 Feb 2026).

GMD extends to mixture or adaptive drifts (e.g., convex combinations of KL, $p$ 4, reverse-KL) to balance between precision/mode coverage and avoid collapse or blurring (Cao et al., 11 Mar 2026).

For conditional or structured data, GMD admits multi-level feature banks and multi-objective loss coordination strategies, as in high-dimensional medical imaging (Li et al., 21 Apr 2026). Symmetry-aware extensions adjust the drift (not just generator equivariance) to ensure sampling from group-invariant distributions (Darouich et al., 7 May 2026).

4. Algorithmic Procedure and Practical Implementation

In practice, the GMD training loop consists of:

Sampling minibatches of latent noise and (if applicable) conditional/context inputs.
Generating model samples and collecting data (or reference) samples.
Computing drift fields via kernel mean shift, optionally in feature space.
Updating generator parameters via stop-gradient regression toward the drifted target.

Inference simply requires a single evaluation of $p$ 5, i.e., one-step (amortized) generation (Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026).

Pseudocode is standardized: $p(x) \propto \exp(-E(x))$ 2 (Lai et al., 8 Mar 2026, Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026).

5. Empirical Results and Applications

GMD achieves state-of-the-art or competitive one-step generation quality on a range of domains:

ImageNet-256: FID 1.54 (latent space), 1.61 (pixel space) with DiT-L/2 models (Deng et al., 4 Feb 2026).
Physics/financial time series/turbulence: training-free GMD with scattering transforms or pretrained flows accurately reproduces target statistics and structures (Coeurdoux et al., 23 Feb 2026).
Boltzmann distributions: Mean $p$ 6 error 0.075, covariance error 0.042, MMD 0.0020 for a four-mode target (Cao et al., 18 Mar 2026).
Symmetry-driven tasks: SymDrift yields 95.7% recall coverage in molecular conformer generation with a $p$ 740 $p$ 8 inference speed gain over multi-step flows (Darouich et al., 7 May 2026).
3D conditional medical image generation: outperforming SDE, flow-matching, GAN, and regression baselines in Dice coefficient and MS-SSIM, with rapid inference (Li et al., 21 Apr 2026).

Drifting achieves full mode coverage on multimodal benchmarks and demonstrates stability where one-sided or MMD flows have mode-collapse or blur (He et al., 12 Mar 2026, Cao et al., 11 Mar 2026).

6. Connections, Limits, and Future Directions

GMD unifies and bridges several prior paradigms:

Smoothed score-matching for diffusion models (Turan et al., 10 Mar 2026, Lai et al., 8 Mar 2026).
Wasserstein and MMD gradient flows (Gretton et al., 6 May 2026, Cao et al., 11 Mar 2026).
Sinkhorn generator flows, introducing OT-based drift fields with enhanced identifiability and mass-transport properties (He et al., 12 Mar 2026, Gretton et al., 6 May 2026).
Long-short flow-map decompositions linking GMD terminal steps to factorizations of transport maps (Li et al., 24 Feb 2026).
Direct fixed-point solvers for divergence-induced flows, extendable via functional-derivative templates (Gretton et al., 6 May 2026).

Active research focuses on:

Theory-informed kernel and feature-space design, including adaptive schedules (Turan et al., 10 Mar 2026, Li et al., 24 Feb 2026).
Scalable and stable extensions for high-d, structured, or manifold-valued data (Cao et al., 11 Mar 2026).
Combining GMD with classifier-free guidance and improved sample diversity/coverage controls.
Empirically, adjusting the kernel (Gaussian, Laplace), choosing repulsion/attraction normalization, and managing the trade-off between computational cost and field estimation accuracy are all crucial.
Cautions involve ensuring tightness (avoidance of mass escape) and conserving the true gradient flow structure, especially in approximate or hybrid settings (Lee, 27 Apr 2026, Gretton et al., 6 May 2026).

7. Summary Table: Major GMD Instantiations

Variant / Application	Key Drift Operator	Main Claims/Results	Reference
ImageNet-256 one-step generation	Laplace/L2/Feature kernel	FID 1.54, state-of-the-art 1-step FID	(Deng et al., 4 Feb 2026)
Boltzmann distribution sampling	Gaussian score-difference	$p$ 9 err 0.0754, MMD 0.0020	(Cao et al., 18 Mar 2026)
Physics/Finance: Training-free GMD	Feature kernel SDE drift	Accurate tails, structure, O(Kp^2N)	(Coeurdoux et al., 23 Feb 2026)
Sinkhorn-Drifting	Two-sided OT kernel	Identifiability, full mode-coverage	(He et al., 12 Mar 2026)
SymDrift (symmetry-aware)	Aligned/invariant drift	95.7% coverage, $p(x) \propto \exp(-E(x))$ 040 $p(x) \propto \exp(-E(x))$ 1 speed	(Darouich et al., 7 May 2026)
Medical image GMD	Feature bank drift field	Outperforms SDE, GAN, regression	(Li et al., 21 Apr 2026)