Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative Modeling via Drifting (GMD)

Updated 9 May 2026
  • The paper introduces a generative modeling framework that regresses a generator’s output toward a fixed-point defined by a kernel drift operator.
  • It leverages the theory of Wasserstein gradient flows to unify score-matching with optimal transport strategies and ensures strong fixed-point identifiability.
  • Empirical results demonstrate one-step generation achieving state-of-the-art results in high-resolution imaging, physics, molecular, and medical applications.

Generative Modeling via Drifting (GMD) is a framework for training generative models that matches distributions by directly regressing a generator’s outputs toward a fixed-point determined by a kernel drift operator. GMD forgoes the need for iterative inference (e.g., MCMC, diffusion), yielding a generator capable of producing approximate samples from the target distribution in a single forward pass. The method is grounded in the theory of Wasserstein gradient flows, can express both score-matching and optimal transport-based strategies, admits strong identifiability results, and has been instantiated in high-resolution image, physics, molecular, and conditional medical image domains.

1. Mathematical Formulation and Drift Operator

At its core, GMD seeks to train a generator fθ:RkRdf_\theta: \mathbb{R}^k\to\mathbb{R}^d (e.g., a neural network) such that the pushforward qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_0 approximates a target distribution pp (either empirical data or Boltzmann, p(x)exp(E(x))p(x) \propto \exp(-E(x))) (Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026).

GMD introduces a vector field (the "drift"): Vp,q(x)=Vp+(x)Vq(x)V_{p,q}(x) = V_p^+(x) - V_q^-(x) where, for a positive kernel k(x,y)k(x,y),

Vp+(x)=Eyp[k(x,y)(yx)]Eyp[k(x,y)]V_p^+(x) = \frac{\mathbb{E}_{y\sim p}\left[k(x, y) (y - x)\right]}{\mathbb{E}_{y\sim p}[k(x, y)]}

and VqV_q^- is defined analogously for qq. This operator can be interpreted as a transport direction: an attraction toward the barycenter of nearby data points and repulsion from the model samples, weighted by the kernel (Deng et al., 4 Feb 2026, Lai et al., 8 Mar 2026).

The generator is optimized to predict a single-step Euler move along this drift: xi+1=xi+Vp,q(xi)x_{i+1} = x_i + V_{p,q}(x_i) Training is performed via a stop-gradient regression loss: qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_00 where qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_01 denotes stop-gradient (Deng et al., 4 Feb 2026).

For Gaussian kernels, the drift simplifies via Tweedie’s identity to a difference of smoothed scores: qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_02 where qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_03 is the Gaussian-smoothed density (Turan et al., 10 Mar 2026, Lai et al., 8 Mar 2026).

In energy-based settings, the target-side smoothed score can be estimated via:

  • Monte Carlo (importance sampling in a local Gaussian ball),
  • Second-order (curvature-corrected) approximation involving the Hessian of qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_04 (Cao et al., 18 Mar 2026).

2. Theoretical Foundations and Gradient Flow Structure

GMD is mathematically equivalent to taking a fixed point of the Wasserstein gradient flow (WGF) of an energy functional qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_05: qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_06 Choosing qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_07 as the (possibly smoothed) Kullback–Leibler divergence recovers score-based models; using Sinkhorn divergence or MMD yields other classes of drift fields (Gretton et al., 6 May 2026, Cao et al., 11 Mar 2026).

The stop-gradient is essential: with it, GMD matches the Euler step of the JKO proximal scheme for the respective gradient flow, ensuring descent in qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_08. Without stop-gradient, field collapse can occur without approaching qθ=(fθ)#p0q_\theta = (f_\theta)_\#p_09 (Turan et al., 10 Mar 2026).

The fixed-point characterization establishes that pp0 for pp1 implies pp2 under suitable identifiability conditions, which hold for translation-invariant positive-definite kernels (e.g., Gaussian, Matérn pp3) (Lee, 27 Apr 2026). Counterexamples establish that field norm vanishing alone does not guarantee weak convergence, but enforcing a lower-bound on an overlap scalar suffices.

3. Extensions, Kernel Choices, and Feature Spaces

GMD can be instantiated with a variety of kernels (Deng et al., 4 Feb 2026, Turan et al., 10 Mar 2026, Lai et al., 8 Mar 2026):

  • Gaussian kernel: exact equivalence to kernel-smoothed score matching, but with exponential high-frequency convergence bottleneck.
  • Laplace/Matérn: only polynomial spectral decay, empirically superior for high-resolution and high-dimensional regimes.
  • Feature-space kernels: by embedding data in a learned or pretrained feature space (e.g., ResNet/MAE latents), the barycenter and transport field can exploit semantic locality (Deng et al., 4 Feb 2026).

GMD extends to mixture or adaptive drifts (e.g., convex combinations of KL, pp4, reverse-KL) to balance between precision/mode coverage and avoid collapse or blurring (Cao et al., 11 Mar 2026).

For conditional or structured data, GMD admits multi-level feature banks and multi-objective loss coordination strategies, as in high-dimensional medical imaging (Li et al., 21 Apr 2026). Symmetry-aware extensions adjust the drift (not just generator equivariance) to ensure sampling from group-invariant distributions (Darouich et al., 7 May 2026).

4. Algorithmic Procedure and Practical Implementation

In practice, the GMD training loop consists of:

  1. Sampling minibatches of latent noise and (if applicable) conditional/context inputs.
  2. Generating model samples and collecting data (or reference) samples.
  3. Computing drift fields via kernel mean shift, optionally in feature space.
  4. Updating generator parameters via stop-gradient regression toward the drifted target.

Inference simply requires a single evaluation of pp5, i.e., one-step (amortized) generation (Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026).

Pseudocode is standardized: p(x)exp(E(x))p(x) \propto \exp(-E(x))2 (Lai et al., 8 Mar 2026, Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026).

5. Empirical Results and Applications

GMD achieves state-of-the-art or competitive one-step generation quality on a range of domains:

Drifting achieves full mode coverage on multimodal benchmarks and demonstrates stability where one-sided or MMD flows have mode-collapse or blur (He et al., 12 Mar 2026, Cao et al., 11 Mar 2026).

6. Connections, Limits, and Future Directions

GMD unifies and bridges several prior paradigms:

Active research focuses on:

  • Theory-informed kernel and feature-space design, including adaptive schedules (Turan et al., 10 Mar 2026, Li et al., 24 Feb 2026).
  • Scalable and stable extensions for high-d, structured, or manifold-valued data (Cao et al., 11 Mar 2026).
  • Combining GMD with classifier-free guidance and improved sample diversity/coverage controls.
  • Empirically, adjusting the kernel (Gaussian, Laplace), choosing repulsion/attraction normalization, and managing the trade-off between computational cost and field estimation accuracy are all crucial.
  • Cautions involve ensuring tightness (avoidance of mass escape) and conserving the true gradient flow structure, especially in approximate or hybrid settings (Lee, 27 Apr 2026, Gretton et al., 6 May 2026).

7. Summary Table: Major GMD Instantiations

Variant / Application Key Drift Operator Main Claims/Results Reference
ImageNet-256 one-step generation Laplace/L2/Feature kernel FID 1.54, state-of-the-art 1-step FID (Deng et al., 4 Feb 2026)
Boltzmann distribution sampling Gaussian score-difference pp9 err 0.0754, MMD 0.0020 (Cao et al., 18 Mar 2026)
Physics/Finance: Training-free GMD Feature kernel SDE drift Accurate tails, structure, O(Kp2N) (Coeurdoux et al., 23 Feb 2026)
Sinkhorn-Drifting Two-sided OT kernel Identifiability, full mode-coverage (He et al., 12 Mar 2026)
SymDrift (symmetry-aware) Aligned/invariant drift 95.7% coverage, p(x)exp(E(x))p(x) \propto \exp(-E(x))040p(x)exp(E(x))p(x) \propto \exp(-E(x))1 speed (Darouich et al., 7 May 2026)
Medical image GMD Feature bank drift field Outperforms SDE, GAN, regression (Li et al., 21 Apr 2026)

GMD constitutes a unified, regression-based, one-step generative modeling paradigm grounded in gradient flow theory, affording rigorously motivated extensions across kernels, divergences, and structured data domains (Deng et al., 4 Feb 2026, Cao et al., 18 Mar 2026, He et al., 12 Mar 2026, Darouich et al., 7 May 2026, Li et al., 21 Apr 2026, Turan et al., 10 Mar 2026, Cao et al., 11 Mar 2026, Lee, 27 Apr 2026, Coeurdoux et al., 23 Feb 2026, Li et al., 24 Feb 2026, Gretton et al., 6 May 2026, Lai et al., 8 Mar 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Modeling via Drifting (GMD).