Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Critically-Damped Langevin Diffusions

Updated 9 November 2025
  • CLDs are defined as a class of stochastic differential equations incorporating critical damping to maximize convergence speed without oscillation.
  • They enhance score-based generative models by injecting noise into velocity, simplifying score estimation and yielding smoother conditional distributions.
  • Rigorous convergence analysis and discretization methods, including Euler–Maruyama and splitting schemes, provide non-asymptotic error bounds and practical implementation strategies.

Critically-damped Langevin Diffusions (CLDs) are a class of stochastic differential equations that introduce critical damping—a parameter regime maximizing the rate of approach to equilibrium without oscillatory behavior—into Langevin-type diffusions. Originally developed to bridge statistical mechanics and generative modeling, CLDs have found broad application in score-based generative models (SGMs), where they improve the efficiency and sample quality compared to classical overdamped or merely underdamped schemes. The framework extends naturally to higher-order versions and admits principled discretizations and tuning strategies, supported by rigorous non-asymptotic convergence theory in the Wasserstein metric.

1. Mathematical Formulation of CLD

Let xtRdx_t \in \mathbb{R}^d denote the position and vtRdv_t \in \mathbb{R}^d the velocity (or momentum). The critically-damped Langevin process is defined by

{dxt=vtdt, dvt=1MU(xt)dtγvtdt+2γ/MdWt,\begin{cases} d x_t = v_t\,dt, \ d v_t = -\frac{1}{M} \nabla U(x_t)\,dt - \gamma v_t\,dt + \sqrt{2\gamma/M}\,dW_t, \end{cases}

where U(x)U(x) is a potential (minus log-density of the target), M>0M>0 the mass, γ>0\gamma>0 the friction (damping) coefficient, and WtW_t a standard Brownian motion. In block form,

ut:=(xt,vt),dut=Autdt+ΣdWt,u_t := (x_t, v_t)^\top,\quad d u_t = A u_t\,dt + \Sigma\,dW_t,

with drift matrix A=(0Id 1M2UγId)A = \begin{pmatrix} 0 & I_d \ -\frac{1}{M}\nabla^2 U & -\gamma I_d \end{pmatrix} and diffusion Σ=2γM(0 Id)\Sigma = \sqrt{\frac{2\gamma}{M}} \begin{pmatrix} 0 \ I_d \end{pmatrix}.

The critical-damping criterion requires that, for the associated ODE x¨+(γ/M)x˙+(1/M)U(x)=0\ddot x + (\gamma/M)\dot x + (1/M)\nabla U(x) = 0, the characteristic polynomial has a double real root; for quadratic U(x)U(x), this is achieved by γ2=4det(K)/M\gamma^2 = 4\det(K)/M, which maximizes convergence without oscillation (Strasman et al., 4 Nov 2025).

2. Generalized CLD Dynamics: Position-Noise Regularization

Recent work extends CLD by introducing an additional position-noise parameter ε0\varepsilon \geq 0, yielding

Σε=(εId0 0σId),A=(0a2Id Id2aId),\Sigma_\varepsilon = \begin{pmatrix}\varepsilon I_d & 0 \ 0 & \sigma I_d \end{pmatrix},\qquad A = \begin{pmatrix} 0 & a^2 I_d \ -I_d & -2a I_d \end{pmatrix},

with a=1/Ma = 1/\sqrt{M}, σ=2/a\sigma = 2/\sqrt{a}. The resulting forward SDE is

dut=Autdt+ΣεdBt,d u_t = A u_t\,dt + \Sigma_\varepsilon\,dB_t,

and the time-reversed SDE for generative modeling becomes

dut=(Aut+Σε2logpTt(ut))dt+ΣεdBt,d u_t = \left( -A u_t + \Sigma_\varepsilon^2 \nabla \log p_{T-t}(u_t) \right)\,dt + \Sigma_\varepsilon\,dB_t,

where psp_s is the law of usu_s. ε>0\varepsilon > 0 renders the diffusion fully nondegenerate (elliptic), which smooths sample paths and regularizes the dynamics (Strasman et al., 4 Nov 2025). The infinitesimal generator is

Lf(u)=Au,f(u)+12Tr(ΣεΣε2f(u)).\mathcal{L}f(u) = \langle A u, \nabla f(u) \rangle + \tfrac{1}{2}\operatorname{Tr}(\Sigma_\varepsilon\Sigma_\varepsilon^\top \nabla^2 f(u) ).

3. Score-Based Generative Modeling with CLDs

In the context of score-based generative models, CLD is used both for the forward process that perturbs the data towards noise, and for the learned backward process that maps noise samples to the data distribution by "denoising." A principal advantage of CLD is that the noise is injected into the velocity rather than the position, resulting in

  • a simpler conditional denoising problem: the network needs only learn the score of the conditional velocity distribution given position, i.e., vlogpt(vx)\nabla_v \log p_t(v|x), instead of the full joint score,
  • target distributions whose conditional scores are smoother (closer to Gaussian) than those encountered in classic overdamped diffusions (Dockhorn et al., 2021).

A new score-matching loss tailored for CLD is given by: L(θ)=Et,x0,v0sθ(xt,vt,t)vlogpt(vtxt)2,\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, v_0} \Vert s_\theta(x_t, v_t, t) - \nabla_v \log p_t(v_t | x_t) \Vert^2, where the expectation is taken over data and noise initializations (Dockhorn et al., 2021).

Sampling is performed by discretizing the reverse SDE, either via Euler–Maruyama or via more accurate splitting schemes, such as the symmetric Strang splitting (SSCS), which exploits the structure of the linear and score terms for improved stability and accuracy (Dockhorn et al., 2021).

4. Convergence Analysis and Error Bounds

CLDs, especially with position-noise regularization, admit explicit non-asymptotic convergence guarantees in the Wasserstein-2 distance. Under mild regularity on the data distribution (finite Fisher information, strongly convex component plus Lipschitz residual, and bounded score-approximation error), (Strasman et al., 4 Nov 2025) proves: W2(Law(XˉTθ),πdata)c1ec2TW2(πdataπv,π)+c1σ2M+c1h,W_2(\operatorname{Law}(\bar{X}^\theta_T), \pi_{\rm data}) \leq c_1 e^{-c_2 T} W_2(\pi_{\rm data} \otimes \pi_v, \pi_\infty) + c_1 \sigma^2 M + c_1 \sqrt{h}, where hh is the discretization step, and MM quantifies the uniform score network error.

Discretization error analysis (using Euler–Maruyama) shows the W2W_2 bias is O(h)O(\sqrt{h}), i.e., order p=1/2p=1/2, matching theoretical lower bounds for non-Lipschitz SDEs. The total bias therefore combines the exponential ergodization term, score error, and discretization error (Strasman et al., 4 Nov 2025).

5. Discretization and Practical Algorithmic Implementation

Discretizing the backward CLD SDE yields the scheme: uk+1=uk+h(Auk+Σε2sθ(tk,uk))+hΣεZk,ZkN(0,I2d),u_{k+1} = u_k + h(-A u_k + \Sigma_\varepsilon^2 s_\theta(t_k, u_k)) + \sqrt{h} \Sigma_\varepsilon Z_k, \qquad Z_k \sim \mathcal{N}(0, I_{2d}), where the score network sθ(t,u)s_\theta(t, u) is evaluated at each timestep. The output is xNx_N, the data variable after NN steps. This scheme is amenable to batch and hardware-accelerated computation in contemporary deep learning frameworks.

The use of SSCS and other structure-exploiting integrators is supported, with superior empirical performance in terms of step size stability and sample quality benchmarks (Dockhorn et al., 2021).

6. Hyperparameter Tuning, Empirical Performance, and Generalizations

Empirically optimal performance is obtained by:

  • Damping parameter aa in [0.5,2][0.5, 2],
  • Position-noise ε\varepsilon in [0.1,0.5][0.1, 0.5], to stabilize and accelerate convergence without excess noise,
  • Step size hh satisfying O(1)O(1) to O(102)O(10^{-2}), balancing bias and computational cost.

In comparative studies (Funnel, MG25, Diamond datasets), the ε\varepsilon-regularized CLD outperforms both the vanilla CLD (ε=0\varepsilon=0) and standard overdamped Langevin schemes in sliced-Wasserstein error (Strasman et al., 4 Nov 2025). On CIFAR-10, CLD-based models confirm gains over standard SDE SGMs for fixed compute budgets (Dockhorn et al., 2021).

The CLD framework generalizes to higher-order variants (e.g., TOLD++, HOLD++), where the critical-damping principle is extended and spectral gap maximization leads to provably optimal mixing under order and trace constraints (Sterling et al., 12 Sep 2024, Sterling et al., 26 Jun 2025).

7. Broader Impact and Extensions

CLDs represent a unifying approach between statistical mechanics and non-equilibrium sampling for generative modeling. By leveraging the extended phase space (data + auxiliary variables) and optimal damping from systems theory, CLDs

  • accelerate convergence and reduce discretization error,
  • simplify score learning,
  • enable more diverse and theoretically grounded sampling schemes.

The analysis and algorithms transfer naturally to symplectic and higher-order integrators and to advanced SGM frameworks requiring conditional or guidance-based sampling (Dockhorn et al., 2021, Strasman et al., 4 Nov 2025). Critical-damping extends to arbitrary order, supporting monotone improvements in spectral gap and sample efficiency (Sterling et al., 26 Jun 2025).

A plausible implication is that CLDs and their higher-order analogues will become standard for constructing efficient and robust diffusion-based generative models, subject to future work that balances spectral-gap gains with practical memory and computational constraints as the order increases.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Critically-damped Langevin Diffusions (CLDs).