Critically-Damped Langevin Diffusions

Updated 9 November 2025

CLDs are defined as a class of stochastic differential equations incorporating critical damping to maximize convergence speed without oscillation.
They enhance score-based generative models by injecting noise into velocity, simplifying score estimation and yielding smoother conditional distributions.
Rigorous convergence analysis and discretization methods, including Euler–Maruyama and splitting schemes, provide non-asymptotic error bounds and practical implementation strategies.

Critically-damped Langevin Diffusions (CLDs) are a class of stochastic differential equations that introduce critical damping—a parameter regime maximizing the rate of approach to equilibrium without oscillatory behavior—into Langevin-type diffusions. Originally developed to bridge statistical mechanics and generative modeling, CLDs have found broad application in score-based generative models (SGMs), where they improve the efficiency and sample quality compared to classical overdamped or merely underdamped schemes. The framework extends naturally to higher-order versions and admits principled discretizations and tuning strategies, supported by rigorous non-asymptotic convergence theory in the Wasserstein metric.

1. Mathematical Formulation of CLD

Let $x_t \in \mathbb{R}^d$ denote the position and $v_t \in \mathbb{R}^d$ the velocity (or momentum). The critically-damped Langevin process is defined by

$\begin{cases} d x_t = v_t\,dt, \ d v_t = -\frac{1}{M} \nabla U(x_t)\,dt - \gamma v_t\,dt + \sqrt{2\gamma/M}\,dW_t, \end{cases}$

where $U(x)$ is a potential (minus log-density of the target), $M>0$ the mass, $\gamma>0$ the friction (damping) coefficient, and $W_t$ a standard Brownian motion. In block form,

$u_t := (x_t, v_t)^\top,\quad d u_t = A u_t\,dt + \Sigma\,dW_t,$

with drift matrix $A = \begin{pmatrix} 0 & I_d \ -\frac{1}{M}\nabla^2 U & -\gamma I_d \end{pmatrix}$ and diffusion $\Sigma = \sqrt{\frac{2\gamma}{M}} \begin{pmatrix} 0 \ I_d \end{pmatrix}$ .

The critical-damping criterion requires that, for the associated ODE $\ddot x + (\gamma/M)\dot x + (1/M)\nabla U(x) = 0$ , the characteristic polynomial has a double real root; for quadratic $U(x)$ , this is achieved by $\gamma^2 = 4\det(K)/M$ , which maximizes convergence without oscillation (Strasman et al., 4 Nov 2025).

2. Generalized CLD Dynamics: Position-Noise Regularization

Recent work extends CLD by introducing an additional position-noise parameter $\varepsilon \geq 0$ , yielding

$\Sigma_\varepsilon = \begin{pmatrix}\varepsilon I_d & 0 \ 0 & \sigma I_d \end{pmatrix},\qquad A = \begin{pmatrix} 0 & a^2 I_d \ -I_d & -2a I_d \end{pmatrix},$

with $a = 1/\sqrt{M}$ , $\sigma = 2/\sqrt{a}$ . The resulting forward SDE is

$d u_t = A u_t\,dt + \Sigma_\varepsilon\,dB_t,$

and the time-reversed SDE for generative modeling becomes

$d u_t = \left( -A u_t + \Sigma_\varepsilon^2 \nabla \log p_{T-t}(u_t) \right)\,dt + \Sigma_\varepsilon\,dB_t,$

where $p_s$ is the law of $u_s$ . $\varepsilon > 0$ renders the diffusion fully nondegenerate (elliptic), which smooths sample paths and regularizes the dynamics (Strasman et al., 4 Nov 2025). The infinitesimal generator is

$\mathcal{L}f(u) = \langle A u, \nabla f(u) \rangle + \tfrac{1}{2}\operatorname{Tr}(\Sigma_\varepsilon\Sigma_\varepsilon^\top \nabla^2 f(u) ).$

3. Score-Based Generative Modeling with CLDs

In the context of score-based generative models, CLD is used both for the forward process that perturbs the data towards noise, and for the learned backward process that maps noise samples to the data distribution by "denoising." A principal advantage of CLD is that the noise is injected into the velocity rather than the position, resulting in

a simpler conditional denoising problem: the network needs only learn the score of the conditional velocity distribution given position, i.e., $\nabla_v \log p_t(v|x)$ , instead of the full joint score,
target distributions whose conditional scores are smoother (closer to Gaussian) than those encountered in classic overdamped diffusions (Dockhorn et al., 2021).

A new score-matching loss tailored for CLD is given by: $\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, v_0} \Vert s_\theta(x_t, v_t, t) - \nabla_v \log p_t(v_t | x_t) \Vert^2,$ where the expectation is taken over data and noise initializations (Dockhorn et al., 2021).

Sampling is performed by discretizing the reverse SDE, either via Euler–Maruyama or via more accurate splitting schemes, such as the symmetric Strang splitting (SSCS), which exploits the structure of the linear and score terms for improved stability and accuracy (Dockhorn et al., 2021).

4. Convergence Analysis and Error Bounds

CLDs, especially with position-noise regularization, admit explicit non-asymptotic convergence guarantees in the Wasserstein-2 distance. Under mild regularity on the data distribution (finite Fisher information, strongly convex component plus Lipschitz residual, and bounded score-approximation error), (Strasman et al., 4 Nov 2025) proves: $W_2(\operatorname{Law}(\bar{X}^\theta_T), \pi_{\rm data}) \leq c_1 e^{-c_2 T} W_2(\pi_{\rm data} \otimes \pi_v, \pi_\infty) + c_1 \sigma^2 M + c_1 \sqrt{h},$ where $h$ is the discretization step, and $M$ quantifies the uniform score network error.

Discretization error analysis (using Euler–Maruyama) shows the $W_2$ bias is $O(\sqrt{h})$ , i.e., order $p=1/2$ , matching theoretical lower bounds for non-Lipschitz SDEs. The total bias therefore combines the exponential ergodization term, score error, and discretization error (Strasman et al., 4 Nov 2025).

5. Discretization and Practical Algorithmic Implementation

Discretizing the backward CLD SDE yields the scheme: $u_{k+1} = u_k + h(-A u_k + \Sigma_\varepsilon^2 s_\theta(t_k, u_k)) + \sqrt{h} \Sigma_\varepsilon Z_k, \qquad Z_k \sim \mathcal{N}(0, I_{2d}),$ where the score network $s_\theta(t, u)$ is evaluated at each timestep. The output is $x_N$ , the data variable after $N$ steps. This scheme is amenable to batch and hardware-accelerated computation in contemporary deep learning frameworks.

The use of SSCS and other structure-exploiting integrators is supported, with superior empirical performance in terms of step size stability and sample quality benchmarks (Dockhorn et al., 2021).

6. Hyperparameter Tuning, Empirical Performance, and Generalizations

Empirically optimal performance is obtained by:

Damping parameter $a$ in $[0.5, 2]$ ,
Position-noise $\varepsilon$ in $[0.1, 0.5]$ , to stabilize and accelerate convergence without excess noise,
Step size $h$ satisfying $O(1)$ to $O(10^{-2})$ , balancing bias and computational cost.

In comparative studies (Funnel, MG25, Diamond datasets), the $\varepsilon$ -regularized CLD outperforms both the vanilla CLD ( $\varepsilon=0$ ) and standard overdamped Langevin schemes in sliced-Wasserstein error (Strasman et al., 4 Nov 2025). On CIFAR-10, CLD-based models confirm gains over standard SDE SGMs for fixed compute budgets (Dockhorn et al., 2021).

The CLD framework generalizes to higher-order variants (e.g., TOLD++, HOLD++), where the critical-damping principle is extended and spectral gap maximization leads to provably optimal mixing under order and trace constraints (Sterling et al., 12 Sep 2024, Sterling et al., 26 Jun 2025).

7. Broader Impact and Extensions

CLDs represent a unifying approach between statistical mechanics and non-equilibrium sampling for generative modeling. By leveraging the extended phase space (data + auxiliary variables) and optimal damping from systems theory, CLDs

accelerate convergence and reduce discretization error,
simplify score learning,
enable more diverse and theoretically grounded sampling schemes.

The analysis and algorithms transfer naturally to symplectic and higher-order integrators and to advanced SGM frameworks requiring conditional or guidance-based sampling (Dockhorn et al., 2021, Strasman et al., 4 Nov 2025). Critical-damping extends to arbitrary order, supporting monotone improvements in spectral gap and sample efficiency (Sterling et al., 26 Jun 2025).

A plausible implication is that CLDs and their higher-order analogues will become standard for constructing efficient and robust diffusion-based generative models, subject to future work that balances spectral-gap gains with practical memory and computational constraints as the order increases.