Distilling Langevin Mixing via Diffusion Models
- The paper demonstrates that training diffusion models to replicate Langevin dynamics can compress slow-mixing trajectories into rapid global moves, significantly cutting autocorrelation times.
- It employs score-based reverse SDEs to accurately reconstruct the target distribution, offering a methodological breakthrough over traditional MCMC and HMC techniques.
- The approach enhances sampling efficiency across diverse fields like lattice field theory and molecular dynamics, addressing challenges such as critical slowing down.
Distilling Langevin mixing with diffusion models refers to the process of compressing—via neural generative modeling—the slow-mixing dynamics of Langevin-type stochastic dynamics into an accelerated, global sampler. In this framework, score-based or energy-based diffusion models are trained to replicate the stationary distribution and the full mixing characteristics of a stochastic process (typically Langevin or complex Langevin dynamics), enabling sampling of independent configurations at a cost orders of magnitude below conventional Markov Chain Monte Carlo (MCMC) algorithms. The approach not only reduces autocorrelation times but can also synthesize samples from distributions previously accessible only through computationally intensive or ill-understood dynamics.
1. Theoretical Foundations: Langevin Dynamics, Stochastic Quantization, and Mixing
Langevin dynamics describes stochastic evolution under Brownian noise, commonly formulated as
where is Gaussian white noise. The associated Fokker–Planck equation governs the evolution of the probability density , with the stationary solution , ensuring ergodic sampling from the desired equilibrium distribution. In practice, the mixing time—the number of steps required to decorrelate samples—is controlled by the spectral gap and can become prohibitively large near criticality due to critical slowing down (Wang et al., 2023).
Complex Langevin (CL) dynamics generalizes this to theories with complex actions, sampling on a complexified configuration space and targeting a real probability density that recovers original observables through analytically continued averages (Aarts et al., 1 Oct 2025, Habibi et al., 2024).
2. Diffusion Models: Forward and Reverse Processes
Score-based diffusion models construct a forward process—iterative noising or smoothing—followed by a learned reverse process that reconstructs the target distribution. The forward SDE is typically defined as
pushing the data to noise. The reverse SDE is given by
where (the “score”) is approximated by a neural network , trained via denoising score matching. The reverse-time integration of this learned SDE (or its deterministic ODE analogue) generates samples distributed according to the target distribution (Wang et al., 2023, Aarts et al., 1 Oct 2025).
For empirical applications, the forward and reverse mapping parameters, noise schedules, and discretization schemes are set to match the dynamics of the target Langevin process. The distinction between score-based and energy-based parameterization allows both direct drift learning and the construction of explicit surrogate energies for MCMC sampling (Aarts et al., 1 Oct 2025).
3. Frameworks for Distillation: Algorithms and Mixing Acceleration
The essence of "distillation" is the compression of the multi-step Langevin mixing trajectory into a few—or even a single—network-guided global move by training the diffusion model to learn the exact non-equilibrium score at all times. The resulting algorithm has the following elements:
- Forward path: Samples are noised through a variance- or diffusion-scheduled forward SDE.
- Score network: Trained to regress by minimizing the loss
0
- Sampling: The learned reverse SDE is discretized (e.g., by Euler–Maruyama or Heun’s method), performing updates
1
starting from a Gaussian prior.
Global proposals generated by the diffusion model can be used as independent samplers or as proposals in Metropolis-type MCMC, drastically accelerating mixing—empirically, reducing autocorrelation times from 2 (Langevin) or 3 (HMC) to 4–5 (Wang et al., 2023). Tabular comparisons (see Table below) quantify the reduction in autocorrelation times.
| Sampler | Autocorrelation 6 |
|---|---|
| Plain Metropolis–Hastings | ~80 |
| Hybrid Monte Carlo (HMC) | ~41 |
| DM-based global (Metropolis) | ~2.4 |
4. Applications: Lattice Field Theory, Molecular/Macromolecular Systems, and Image Restoration
Distilled Langevin mixing via diffusion models has demonstrable impact in several research areas:
- Lattice field theory: Sampling field configurations in lattice 7 theory with a diffusion model reduces autocorrelation by 8 over HMC, overcoming critical slowing down and enabling rapid generation of independent ensembles (Wang et al., 2023).
- Complex action systems: Trained on CL data, score-based and energy-based diffusion models replicate not only the marginal distributions but all measured moments and cumulants (to accuracy controlled by network and data) and provide high-acceptance explicit energy functions for alternative MCMC schemes (Aarts et al., 1 Oct 2025, Habibi et al., 2024).
- Molecular dynamics: Denoising diffusion models with sequential bias realize an Euler–Maruyama integrator for overdamped Langevin dynamics, recovering correct equilibrium statistics and MD-like temporal correlations with as few as 9–0 denoising steps, thus "distilling" simulated MD or Langevin trajectories into efficient learned samplers (Diamond et al., 21 Nov 2025).
- Image restoration: Empirical Bayesian image restoration employs pretrained DDPM denoisers as priors within latent-split Langevin samplers, achieving state-of-the-art PSNR/SSIM with an order-of-magnitude fewer steps than DDPM/DDIM, and mixing times of 1 iterations for realistic image sampling (Mbakam et al., 2024).
5. Evaluation: Distributional Accuracy, Mixing Diagnostics, and Error Bounds
Validation of the distillation approach involves both distributional metrics (e.g., moments, cumulants, cross-sections of learned vs. true probabilities) and mixing diagnostics (autocorrelation time 2, effective sample size). In benchmarks involving complex Langevin processes, trained diffusion models match analytical or reference results for moments up to 3 within a few percent and yield effective sample sizes nearly equal to the number of generated samples due to negligible autocorrelation (Habibi et al., 2024).
Information-theoretic error bounds for distilled diffusion samplers have the form
4
separating model (score) error and discretization error (Diamond et al., 21 Nov 2025).
6. Limitations, Assumptions, and Open Questions
The acceleration afforded by distilling Langevin mixing with diffusion models depends on several factors:
- Score network accuracy: The learned score 5 must closely approximate the true score at every noise level; performance and mixing gains degrade otherwise.
- Discretization error: Step size in the reverse SDE introduces bias, mitigated by smaller steps, advanced integrators, or Metropolis corrections.
- Network capacity and scaling: Automodeling in high dimensions, especially near criticality or with complex-valued actions, requires capacity scaling and may display sensitivity to training pathologies or out-of-distribution failure modes.
- Diagnostic limitations: The approach does not correct for CL convergence failures in sign-problem contexts; rather, it replicates the stationary law implicit in the training data.
- Rigorous theory: While empirical reductions in critical slowing down and autocorrelation are dramatic, general bounds on dynamical critical exponents or guarantees of mixing acceleration remain open (Wang et al., 2023).
7. Extensions and Future Directions
A plausible implication is that distillation of Langevin mixing by diffusion models could enable efficient simulation in previously inaccessible regimes of lattice gauge theory, quantum many-body systems, molecular dynamics, and ill-posed inverse problems. Key open directions include generalization to high-dimensional and gauge systems, combination with complex-action or multimodal target distributions, hybridization with energy-based MCMC, and the development of analytic diagnostics for mixing and convergence in the learned model (Aarts et al., 1 Oct 2025, Habibi et al., 2024, Diamond et al., 21 Nov 2025).
References:
- (Wang et al., 2023) Diffusion Models as Stochastic Quantization in Lattice Field Theory
- (Diamond et al., 21 Nov 2025) Diffusion Models are Molecular Dynamics Simulators
- (Aarts et al., 1 Oct 2025) Combining complex Langevin dynamics with score-based and energy-based diffusion models
- (Habibi et al., 2024) Diffusion models learn distributions generated by complex Langevin dynamics
- (Mbakam et al., 2024) Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior