Unadjusted Langevin Algorithm (ULA)

Updated 13 January 2026

Unadjusted Langevin Algorithm (ULA) is a discretization method for overdamped Langevin dynamics that approximates high-dimensional target distributions using gradient updates and Gaussian noise.
It employs a fixed or variable step size to balance bias and variance, with convergence guarantees under strong convexity and smoothness conditions.
Enhanced variants like TULA and proximal ULA improve stability and performance in non-smooth, non-convex, and high-dimensional settings, broadening its applications in Bayesian inference.

The Unadjusted Langevin Algorithm (ULA) is a widely used discretization of overdamped Langevin dynamics for sampling from complex, high-dimensional probability distributions with densities known up to a normalization constant. ULA has become central in scalable Bayesian inference, machine learning, and computational statistics, with a rigorous research literature spanning smooth and non-smooth analysis, convergence rates, high-dimensional scaling, algorithmic enhancements, and extensions to non-convex and non-log-concave regimes.

1. Mathematical Formulation and Core Mechanism

ULA is derived by discretizing the continuous-time Langevin SDE targeting a density $\pi(x) \propto \exp(-U(x))$ on $\mathbb{R}^d$ with a potential $U:\mathbb{R}^d\to\mathbb{R}$ : $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dB_t,$ where $B_t$ is standard $d$ -dimensional Brownian motion. The ULA iteration, using a constant or variable step size $h > 0$ , updates as: $X_{k+1} = X_k - h\,\nabla U(X_k) + \sqrt{2h}\;\xi_{k+1},\quad \xi_{k+1} \sim N(0, I_d).$ The algorithm forms a non-reversible Markov chain whose stationary law $\pi_h$ approximates the target $\pi$ as $h\to 0$ . Empirical averages converge to expectations under $\pi$ with a step size–dependent bias (Durmus et al., 2015).

2. Theoretical Convergence Analysis

Strongly Log-Concave and Smooth Potentials

When $U$ is strongly convex ( $\nabla^2 U \succeq m I$ ) and $\nabla U$ is $L$ -Lipschitz, non-asymptotic convergence in $W_2$ and total variation distances is exponential; the bias term $W_2(\pi_h,\pi) = O(\sqrt{d}\,h)$ , and the mixing rate is geometric in $k$ with rate $1 - O(mh)$, provided $h < 1/(m + L)$ (Durmus et al., 2016, Durmus et al., 2018, Durmus et al., 2015). The bias–variance trade-off necessitates $h = O(\varepsilon^2/d)$ for $W_2$ error $\le \varepsilon$ (Chen et al., 2024).

Weakly Smooth, Non-Convex, and Superlinear Potentials

For non-convex or merely weakly-smooth (e.g., Hölder or “mixture $a$ -weakly smooth”) potentials, ULA convergence can still be established by smoothing the potential or using convexification strategies. For mixture $a$ -weak smoothness, balancing smoothing and discretization leads to polynomial iteration complexity in $d$ and $1/\varepsilon$ (Nguyen et al., 2021). Without global Lipschitzness but with appropriate dissipativity and functional inequalities (e.g., LSI, Poincaré, Talagrand), ULA achieves $W_2$ and KL convergence with step size $h = O(\varepsilon^{1/(1 + a)})$ (Nguyen et al., 2021).

Superlinear drifts may cause classical ULA to diverge. The Tamed ULA (TULA) replaces the drift $\nabla U(x)$ by a bounded approximation, ensuring stability and preserving $O(\sqrt{h})$ bias and geometric convergence under weak conditions (Brosse et al., 2017).

Non-Smooth and Discontinuous Gradients

For targets with non-smooth or pointwise discontinuous gradients, ULA and its subgradient variant (SG-ULA) converge with reduced rates. If the drift is piecewise Lipschitz or obeys only linear growth, the stepsize bias degrades to $O(h^{1/2})$ or $O(h^{1/4})$ in $W_p$ , but convergence in Wasserstein distance still holds under suitable dissipativity and moment bounds (Johnston et al., 2023, Johnston et al., 5 Feb 2025). For convex but non-differentiable $U$ , stochastic subgradient or proximal extensions of ULA (SSGLD, SPGLD) provide convergence guarantees by leveraging convex optimization tools (Durmus et al., 2018, Bernton, 2018).

3. Algorithmic Enhancements and Extensions

Transport Map and Geometry-Informed ULA

Transport map–based ULA (TMULA) leverages an invertible map $T$ that pushes $\pi$ towards a tractable reference measure. Discretizing Langevin dynamics in the mapped space yields preconditioned or Riemannian manifold dynamics, and learning $T$ (e.g., by normalizing flows) systematically accelerates sampling, enhancing strong convexity and conditioning (Zhang et al., 2023, Cai et al., 2023). Geometry-informed irreversible perturbations further accelerate mixing by introducing skew-symmetric drift (Zhang et al., 2023).

Proximal and Double-Loop ULA

Proximal ULA schemes split the update into a proximal step for non-smooth terms and a Gaussian perturbation, aligning with the JKO-splitting of Wasserstein gradient flows. This broadens applicability to composite and non-smooth posteriors (Bernton, 2018). Double-loop step-size schedules (DL-ULA) improve convergence under light-tail (non-strong-convex) conditions by alternating fast-mixing batches and step-size reductions, providing first non-asymptotic $W_2$ guarantees in high dimension for log-concave targets (Rolland et al., 2020).

Preconditioning and Domain-Specific Schemes

Preconditioned ULA applies a matrix-valued adaptation to both drift and noise, flattening quadratic curvature and reducing the effective condition number. In inverse problems, notably MRI reconstruction, this permits larger steps, faster mixing, and robust uncertainty quantification with minimal parameter tuning (Blumenthal et al., 5 Dec 2025). Hybrid approaches incorporating data-driven priors or denoisers (e.g., with plug-and-play or learned normalizing flows) integrate deep models for the prior, with theoretical well-posedness and practical acceleration (Cai et al., 2023).

4. Convergence Metrics, High-Dimensional Scaling, and Bias Localization

Convergence guarantees for ULA are typically given in total variation, $W_2$ , KL, or Rényi divergences. Recent work distinguishes between global and marginal (partial coordinate) convergence: while $W_2(\pi_h, \pi) \propto \sqrt{d h}$ , marginal bias for $K$ -dimensional coordinates is $O(\sqrt{K h \log d})$ . This effect, called "delocalization of bias," means that low-dimensional projections can mix on $O(K \log d/\varepsilon^2)$ timescales, even when full-dimensional convergence is slower; this is particularly sharp for Gaussian targets and strongly log-concave measures with sparse graphical structure (Chen et al., 2024).

Metric	Classical ULA	DL-ULA	TULA	Weak-smooth/Non-smooth	High-dim delocalization
$W_2$	$O(\sqrt{d h})$	$O(d^9 T^{-1/6})$	$O(\sqrt{h})$	$O(h^{1/(1+a)})$ – $O(h^{1/2})$	$O(\sqrt{K h \log d})$ for $K$ -marginals
TV	$O(d h)$	$O(d^3 T^{-1})$	$O(\sqrt{h})$	$O(h^{1/2})$
KL	$O(d h)$	$O(d^3 T^{-2/3})$	$O(h)$	$O(h^{1+a})$

5. Functional Inequalities, Mixing, and Isoperimetry

ULA’s convergence in KL and Rényi divergences is controlled by functional inequalities satisfied by the target and its discretization bias. Log-Sobolev inequalities (LSI) suffice for exponential decay of relative entropy without requiring convexity, provided the Hessian is bounded; the key recursion is: $H_\nu(\rho_{k+1}) \leq e^{-\alpha \eta} H_\nu(\rho_k) + O(\eta^2 n L^2)$ where $H_\nu$ is KL and $n$ the dimension (Vempala et al., 2019). Under LSI or Poincaré for the target, with $L$ -smoothness, ULA achieves explicit geometric convergence in KL or Rényi divergence, with iterates reaching $\epsilon$ -precision in $O(L^2 n/\alpha^2)$ steps for an optimal step size (Vempala et al., 2019). Bias vanishes as stepsize $\eta\to 0$ given third-order smoothness.

Empirically, ULA’s exponential decay of $\Phi$ -mutual information corresponds to effective mixing and rapid "decorrelation" of samples, with strong convexity and LSI controlling independence time and necessary burn-in (Liang et al., 2024).

6. Practical Aspects and Implementation Guidance

Step size must be chosen according to drift smoothness, strong convexity, and, in the non-smooth or non-convex case, according to local polynomial-Lipschitz bounds and dissipativity. For stability and accuracy:

$h < 1/(m + L)$ for strongly convex and Lipschitz $\nabla U$ .
$h = O(\varepsilon^2/d)$ for $W_2$ error $\le \varepsilon$ .
For TULA and non-smooth cases, $h$ must be further reduced and sometimes coordinatewise taming used (Brosse et al., 2017).
For weak smoothness (mixture $a$ ), $h = O(\varepsilon^{1/(1 + a)})$ (Nguyen et al., 2021).
For high-dimensional applications, delocalization implies $h = O(\varepsilon^2/(K \log d))$ for $K$ -marginal error $\le \varepsilon$ (Chen et al., 2024).

Extensions to multiplicative noise, proximal steps, stochastic (minibatch) gradients, and non-Euclidean/geometry-informed corrections expand the practical utility of ULA in modern Bayesian computation (Pages et al., 2020, Durmus et al., 2018, Zhang et al., 2023).

7. Limitations, Open Problems, and Future Directions

Despite broad theoretical guarantees, limitations of ULA include discretization bias (which may persist outside the strongly convex regime), possible instability for aggressive step sizes or superlinear drifts, and slow mixing for severely ill-conditioned or multimodal targets. Ongoing research addresses:

Reducing asymptotic bias via adaptive step-size, MALA-corrections, or transport-based preconditioning (Zhang et al., 2023, Blumenthal et al., 5 Dec 2025).
Robustness to non-smoothness and low regularity, with convergence rates for SG-ULA/SPGLD in high dimensions (Johnston et al., 5 Feb 2025).
Extension of delocalization results to broader classes of non-Gaussian and non-sparse targets (Chen et al., 2024).
Integration of structural priors, normalizing flows, or data-driven denoising modules in large-scale inverse problems and imaging (Cai et al., 2023).
Quantitative guidance for parameter selection balancing efficiency versus accuracy in realistic workloads.

ULA remains an active subject of research as both a foundation for high-dimensional sampling algorithms and a testbed for theoretical developments in stochastic processes, optimization in measure spaces, and computational statistics.