Langevin Monte Carlo (LMC) Algorithms

Updated 20 May 2026

Langevin Monte Carlo (LMC) algorithms are MCMC methods that discretize Langevin diffusions to sample complex, high-dimensional distributions.
They combine gradient-driven dynamics with Gaussian noise to navigate probability landscapes, with performance sensitive to smoothness and dimension.
Extensions like high-order, coordinate, heavy-tailed, and proximal variants improve efficiency in nonconvex, nonsmooth, and structured sampling scenarios.

Langevin Monte Carlo (LMC) algorithms form a central class of Markov Chain Monte Carlo (MCMC) methods for sampling from complex high-dimensional distributions. These algorithms are based on discretizations of the Langevin diffusion, which connects the geometry of the target distribution with stochastic dynamics, enabling efficient exploration and sampling even in high-dimensional and ill-conditioned regimes. A wide variety of LMC-type methods now exist, extending the classical unadjusted approach to accommodate weak smoothness, nonsmoothness, high-order integrators, regime switching, coordinate structure, heavy-tailed dynamics, and discrete sample spaces.

1. Mathematical Foundations and Classical LMC

LMC targets sampling from a density $\pi(x) \propto \exp(-f(x))$ , with $f:\mathbb{R}^d \to \mathbb{R}$ typically assumed to be smooth and (strongly or locally) convex. The continuous-time overdamped Langevin SDE is

$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$

where $B_t$ is standard $d$ -dimensional Brownian motion. This SDE is ergodic with stationary law $\pi$ under mild regularity.

The Euler–Maruyama discretization, or Unadjusted Langevin Algorithm (ULA), is

$x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$

Here, $h>0$ is the step size. Convergence analysis of LMC typically requires $f$ to be $L$ -smooth (Lipschitz gradient) and $f:\mathbb{R}^d \to \mathbb{R}$ 0-strongly convex, though extensions relax these assumptions (Dalalyan et al., 2017, Li et al., 2021).

For convex, log-smooth, and log-strongly-concave targets with third-derivative growth, the optimal non-asymptotic mixing time in $f:\mathbb{R}^d \to \mathbb{R}$ 1 distance is $f:\mathbb{R}^d \to \mathbb{R}$ 2 (Li et al., 2021), improving over the earlier $f:\mathbb{R}^d \to \mathbb{R}$ 3 dependence.

2. Dimension Dependence, Convergence Rates, and Nonasymptotics

The convergence of LMC and its variants has been rigorously analyzed in various metrics (Wasserstein, total variation, KL, χ², Rényi). A key point is the dependence of iteration complexity on dimension $f:\mathbb{R}^d \to \mathbb{R}$ 4 and accuracy $f:\mathbb{R}^d \to \mathbb{R}$ 5.

Setting	Assumptions	Mixing time (W $f:\mathbb{R}^d \to \mathbb{R}$ 6 error $f:\mathbb{R}^d \to \mathbb{R}$ 7)	Reference
Strongly log-concave	$f:\mathbb{R}^d \to \mathbb{R}$ 8-smooth, $f:\mathbb{R}^d \to \mathbb{R}$ 9-strongly convex, 3rd-derivative growth	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 0	(Li et al., 2021)
First-order smooth	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 1-smooth, strongly dissipative	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 2	(Erdogdu et al., 2020)
Weakly smooth	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 3-weakly smooth, convex	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 4	(Chatterji et al., 2019)
Second-order/Hessian-Lipschitz	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 5, globally $dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 6-Lipschitz Hessian	$dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 7 (W $dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 8) and $dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,$ 9 (TV)	(Dalalyan et al., 2017)
High-order integrators	Smooth, strongly convex	$B_t$ 0, $B_t$ 1, $B_t$ 2	(Dang et al., 24 Aug 2025)
Fractional/noisy LMC	$B_t$ 3-stable noise, nonsmooth	Polynomial in $B_t$ 4 with larger exponent	(Şimşekli, 2017, Nguyen et al., 2019)

The optimal $B_t$ 5 scaling for standard LMC (Li et al., 2021) is achieved by a careful local error analysis, exploiting contractivity of the SDE, and refining mean-square estimates of discretization bias.

Extensions allow for nonconvexity via strong dissipativity and Log-Sobolev inequalities (Erdogdu et al., 2020), and further, to weak functional settings via Poincaré, modified Log-Sobolev, or Latała–Oleszkiewicz inequalities, under weak smoothness (Hölder gradients) (Chewi et al., 2021).

3. Extensions: Beyond Smoothness, Higher-Order, Coordinate and Proximal Methods

Nonsmooth, Weakly Smooth, and Black-Box Potentials

For nonsmooth or merely Hölder-smooth potentials—where gradient Lipschitzness may fail—LMC cannot be applied directly. Several variants exist:

Implicit (Gaussian) smoothing: Add a small Gaussian perturbation before evaluating the gradient, yielding P-LMC (Chatterji et al., 2019). Under $B_t$ 6–weakly smooth potentials, mixing time is polynomial for both Wasserstein and TV distance.
p-generalized Gaussian smoothing: Use p-generalized ( $B_t$ 7) random directions to define a smooth surrogate potential, enabling black-box gradient sampling (Doan et al., 2020).
Proximal Langevin (IPLA): Relax the explicit drift step to an approximate proximal step (solve $B_t$ 8), allowing for convex targets with super-quadratic growth and only local strong convexity beyond a radius (Benko et al., 2024).

High-Order and Hessian-Free Integrators

To improve the order of weak convergence, higher-order discretization schemes have been developed:

Runge–Kutta integrators: Strong order 1.5 Runge–Kutta LMC that is Hessian-free and requires only two gradient evaluations per step. W $B_t$ 9 error is $d$ 0, with per-step cost reduced from previous three-gradient schemes (Yang et al., 8 May 2026).
Itô–Taylor/High-order integrators: Order-1.5 unadjusted LMC (HOLA) achieves W $d$ 1 rate $d$ 2 with precise regularity in the third derivative of $d$ 3 (Sabanis et al., 2018).
Splitting-based P-th order: For $d$ 4, splitting and Taylor-based integrators achieve mixing time $d$ 5 with $d$ 6 for $d$ 7, providing further acceleration in high dimensions (Dang et al., 24 Aug 2025).

Coordinate and Variance-Reduced LMC

Full-gradient evaluation can be costly in large $d$ 8:

Random Coordinate LMC (RC-LMC): At each step, only one coordinate and its partial derivative are updated. For Hessian-Lipschitz potentials, the total computational cost is $d$ 9, a square-root speedup over classical LMC (Ding et al., 2020).
Variance-reduced coordinate LMC (SVRG/SAGA): Variance reduction restores optimal iteration scaling while retaining per-iteration cost benefits; for underdamped LMC, the cost per sample is $\pi$ 0 when using SVRG or SAGA (Ding et al., 2020).

Ensemble and Discrete LMC

Ensemble LMC: Particles infer gradients from neighbors to reduce true gradient computation. Instability is mitigated by constraints (CEnLMC), allowing for most steps to use finite-difference approximations (Ding et al., 2021).
DLMC (Discrete LMC): Wasserstein-gradient-flow construction extended to discrete sample spaces, yielding invariant Markov chains with parallel and time-uniform implementations and rigorous spectral gap controls (Sun et al., 2022).

4. Modifications for Nonconvexity, Metastability, and Heavy-tailed Exploration

Landscape Modification and Accelerated Mixing

In highly nonconvex or multimodal settings, classical LMC mixing time can be exponentially slow in the low-temperature regime due to large energy barriers. "Landscape-modified" LMC transforms the potential $\pi$ 1 to $\pi$ 2, reducing the effective energy barrier to $\pi$ 3 and converting exponential-in-barrier Log-Sobolev constants to polynomial dependence. All functional-inequality-based analyses (Poincaré, LSI) and convergence rates are correspondingly improved (Choi et al., 2023).

Fractional LMC (Heavy-tailed Drivers)

Classical (Brownian-driven) LMC may be slow to escape local minima. Fractional LMC (FLMC) replaces the Brownian increment with a symmetric $\pi$ 4-stable Lévy increment ( $\pi$ 5), leading to dynamics with heavy-tailed jumps (Şimşekli, 2017). These methods preserve $\pi$ 6 as invariant (with suitable drift), facilitate barrier crossing in multimodal potentials, and achieve faster empirical mixing in double-well experiments. Finite-time error bounds in nonconvex setting show greater sensitivity to step-size choice; smaller $\pi$ 7 is required due to heavier discretization bias (Nguyen et al., 2019).

Quasi-Monte Carlo LMC

Replacing Gaussian noise in LMC with low-discrepancy, completely uniformly distributed (CUD) sequences yields variance reduction in the estimator. Under smooth, strongly convex potentials, using LQMC achieves MSE $\pi$ 8 with arbitrarily small $\pi$ 9, substantially improving over the $x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$ 0 scaling of ordinary MC (Liu, 2023).

Regime-Switching LMC

"Regime-switching" Langevin algorithms couple the discretized Langevin process with a finite-state Markov chain governing step-size, drift scaling, or friction. This introduces diversity into the sampler dynamics, yielding faster contraction rates and improved mixing time under valid design of CTMC parameters. Regime-switching underdamped LMC achieves $x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$ 1 or even $x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$ 2 iteration complexity (Wang et al., 31 Aug 2025).

5. Analysis Techniques and Functional Inequalities

Rigorous convergence results for LMC employ multiple techniques, often leveraging functional inequalities:

Contractivity in Wasserstein distance ( $x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$ 3): Strongly convex and log-smooth potentials yield exponential contraction via synchronous coupling (Li et al., 2021).
Mean-square and local error analysis: Sharper local error bounds (including order-2 weak error under third-derivative conditions) are key to proving optimal $x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$ 4 scaling (Li et al., 2021).
Functional inequalities and metric convergence: Poincaré (variance), Latała–Oleszkiewicz, and (modified) Log-Sobolev inequalities control mixing and allow extension to weak smoothness or nonconvex settings (Chewi et al., 2021).
Girsanov change of measure: Employed for analyzing discrete-time and stochastic-gradient LMC via path-space divergences.
Higher-order Taylor expansions and splitting: Underpin the design and analysis of high-order integrators (Dang et al., 24 Aug 2025, Yang et al., 8 May 2026).
Nonlocal (fractional) analysis for FLMC: Stationarity proofs require careful use of Riesz fractional derivatives (Şimşekli, 2017).

6. Practical Implementation and Algorithm Selection

Practical deployment of LMC algorithms requires attention to dimension scaling, smoothness assumptions, computational cost per iteration, and robustness to nonconvexity or nonsmoothness:

Step-size selection: Typically $x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).$ 5 to balance contractivity and discretization error (Li et al., 2021).
Gradient/hessian computation cost: High-order and variance-reduced methods reduce iteration count or per-iteration cost (Yang et al., 8 May 2026, Ding et al., 2020).
Proximal and nonsmooth variants: IPLA and Gaussian-smoothed methods extend applicability to challenging targets (super-quadratic, total variation-regularized models, etc.) (Benko et al., 2024, Chatterji et al., 2019).
Heavy-tailed and non-convex situations: FLMC and CG-LMC provide enhanced global exploration and robustness at the expense of tighter tuning (Şimşekli, 2017, Basu et al., 30 Jan 2025).
Discrete spaces and stochastic/mini-batch gradients: DLMC and SGLD-type methods offer scalable and parallelizable alternatives for large-scale and structured problems (Sun et al., 2022).

Algorithm selection should be guided by the interplay between target distribution properties, dimension, computational resources, and required statistical guarantees. Detailed parameterization and tuning guidance are available in cited works.

7. Recent Developments and Research Directions

Recent advances focus on:

Pushing boundaries of the smoothness/convexity regime via black-box and proximal variants (Benko et al., 2024, Doan et al., 2020).
High-order, Hessian-free methods with strong theoretical and empirical guarantees, reducing dimension and error dependence (Yang et al., 8 May 2026, Dang et al., 24 Aug 2025).
Function-space and discrete-space extensions informed by Wasserstein gradient flow theory (Sun et al., 2022).
Landscape modification techniques reducing mixing times in low-temperature and multimodal regimes (Choi et al., 2023).
Heavy-tailed diffusion and fractional noise for accelerated traversal of nonconvex landscapes (Şimşekli, 2017, Nguyen et al., 2019).
Direct comparison and interpolation between stochastic-gradient, variance-reduced, and coordinate-update paradigms in high-dimensional MCMC, with practical recipes for tuning and algorithmic trade-offs (Ding et al., 2020, Ding et al., 2020, Dalalyan et al., 2017).

Further open directions include: adaptive and preconditioned LMC, automatic selection and adaptation of step-size/sampler order, rigorous analysis of nonconvex and multimodal sampling, nonasymptotic guarantees in general metric spaces, and robust high-dimensional implementations for large-scale, structured models.

Key References:

“Sqrt(d) Dimension Dependence of Langevin Monte Carlo” (Li et al., 2021)
“Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence” (Erdogdu et al., 2020)
“Langevin Monte Carlo without smoothness” (Chatterji et al., 2019)
“Analysis of Langevin Monte Carlo from Poincaré to Log-Sobolev” (Chewi et al., 2021)
“Accelerating Langevin Monte Carlo via Efficient Stochastic Runge–Kutta Methods beyond Log-Concavity” (Yang et al., 8 May 2026)
“High-Order Langevin Monte Carlo Algorithms” (Dang et al., 24 Aug 2025)
“Langevin Monte Carlo Beyond Lipschitz Gradient Continuity” (Benko et al., 2024)
“Regime-Switching Langevin Monte Carlo Algorithms” (Wang et al., 31 Aug 2025)
“Langevin Quasi-Monte Carlo” (Liu, 2023)
“Improved Langevin Monte Carlo for stochastic optimization via landscape modification” (Choi et al., 2023)
“Random Coordinate Langevin Monte Carlo” (Ding et al., 2020)
“Langevin Monte Carlo: random coordinate descent and variance reduction” (Ding et al., 2020)
“Fractional Langevin Monte Carlo: Exploring Lévy Driven SDEs for MCMC” (Şimşekli, 2017)
“Non-asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization” (Nguyen et al., 2019)
“Constrained Ensemble Langevin Monte Carlo” (Ding et al., 2021)
“Discrete Langevin Sampler via Wasserstein Gradient Flow” (Sun et al., 2022)
“Estimating Multi-chirp Parameters using Curvature-guided Langevin Monte Carlo” (Basu et al., 30 Jan 2025)