Papers
Topics
Authors
Recent
Search
2000 character limit reached

Langevin Monte Carlo (LMC) Algorithms

Updated 20 May 2026
  • Langevin Monte Carlo (LMC) algorithms are MCMC methods that discretize Langevin diffusions to sample complex, high-dimensional distributions.
  • They combine gradient-driven dynamics with Gaussian noise to navigate probability landscapes, with performance sensitive to smoothness and dimension.
  • Extensions like high-order, coordinate, heavy-tailed, and proximal variants improve efficiency in nonconvex, nonsmooth, and structured sampling scenarios.

Langevin Monte Carlo (LMC) Algorithms

Langevin Monte Carlo (LMC) algorithms form a central class of Markov Chain Monte Carlo (MCMC) methods for sampling from complex high-dimensional distributions. These algorithms are based on discretizations of the Langevin diffusion, which connects the geometry of the target distribution with stochastic dynamics, enabling efficient exploration and sampling even in high-dimensional and ill-conditioned regimes. A wide variety of LMC-type methods now exist, extending the classical unadjusted approach to accommodate weak smoothness, nonsmoothness, high-order integrators, regime switching, coordinate structure, heavy-tailed dynamics, and discrete sample spaces.

1. Mathematical Foundations and Classical LMC

LMC targets sampling from a density π(x)exp(f(x))\pi(x) \propto \exp(-f(x)), with f:RdRf:\mathbb{R}^d \to \mathbb{R} typically assumed to be smooth and (strongly or locally) convex. The continuous-time overdamped Langevin SDE is

dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,

where BtB_t is standard dd-dimensional Brownian motion. This SDE is ergodic with stationary law π\pi under mild regularity.

The Euler–Maruyama discretization, or Unadjusted Langevin Algorithm (ULA), is

xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).

Here, h>0h>0 is the step size. Convergence analysis of LMC typically requires ff to be LL-smooth (Lipschitz gradient) and f:RdRf:\mathbb{R}^d \to \mathbb{R}0-strongly convex, though extensions relax these assumptions (Dalalyan et al., 2017, Li et al., 2021).

For convex, log-smooth, and log-strongly-concave targets with third-derivative growth, the optimal non-asymptotic mixing time in f:RdRf:\mathbb{R}^d \to \mathbb{R}1 distance is f:RdRf:\mathbb{R}^d \to \mathbb{R}2 (Li et al., 2021), improving over the earlier f:RdRf:\mathbb{R}^d \to \mathbb{R}3 dependence.

2. Dimension Dependence, Convergence Rates, and Nonasymptotics

The convergence of LMC and its variants has been rigorously analyzed in various metrics (Wasserstein, total variation, KL, χ², Rényi). A key point is the dependence of iteration complexity on dimension f:RdRf:\mathbb{R}^d \to \mathbb{R}4 and accuracy f:RdRf:\mathbb{R}^d \to \mathbb{R}5.

Setting Assumptions Mixing time (Wf:RdRf:\mathbb{R}^d \to \mathbb{R}6 error f:RdRf:\mathbb{R}^d \to \mathbb{R}7) Reference
Strongly log-concave f:RdRf:\mathbb{R}^d \to \mathbb{R}8-smooth, f:RdRf:\mathbb{R}^d \to \mathbb{R}9-strongly convex, 3rd-derivative growth dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,0 (Li et al., 2021)
First-order smooth dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,1-smooth, strongly dissipative dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,2 (Erdogdu et al., 2020)
Weakly smooth dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,3-weakly smooth, convex dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,4 (Chatterji et al., 2019)
Second-order/Hessian-Lipschitz dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,5, globally dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,6-Lipschitz Hessian dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,7 (Wdxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,8) and dxt=f(xt)dt+2dBt,dx_t = -\nabla f(x_t)\,dt + \sqrt{2}\,dB_t,9 (TV) (Dalalyan et al., 2017)
High-order integrators Smooth, strongly convex BtB_t0, BtB_t1, BtB_t2 (Dang et al., 24 Aug 2025)
Fractional/noisy LMC BtB_t3-stable noise, nonsmooth Polynomial in BtB_t4 with larger exponent (Şimşekli, 2017, Nguyen et al., 2019)

The optimal BtB_t5 scaling for standard LMC (Li et al., 2021) is achieved by a careful local error analysis, exploiting contractivity of the SDE, and refining mean-square estimates of discretization bias.

Extensions allow for nonconvexity via strong dissipativity and Log-Sobolev inequalities (Erdogdu et al., 2020), and further, to weak functional settings via Poincaré, modified Log-Sobolev, or Latała–Oleszkiewicz inequalities, under weak smoothness (Hölder gradients) (Chewi et al., 2021).

3. Extensions: Beyond Smoothness, Higher-Order, Coordinate and Proximal Methods

Nonsmooth, Weakly Smooth, and Black-Box Potentials

For nonsmooth or merely Hölder-smooth potentials—where gradient Lipschitzness may fail—LMC cannot be applied directly. Several variants exist:

  • Implicit (Gaussian) smoothing: Add a small Gaussian perturbation before evaluating the gradient, yielding P-LMC (Chatterji et al., 2019). Under BtB_t6–weakly smooth potentials, mixing time is polynomial for both Wasserstein and TV distance.
  • p-generalized Gaussian smoothing: Use p-generalized (BtB_t7) random directions to define a smooth surrogate potential, enabling black-box gradient sampling (Doan et al., 2020).
  • Proximal Langevin (IPLA): Relax the explicit drift step to an approximate proximal step (solve BtB_t8), allowing for convex targets with super-quadratic growth and only local strong convexity beyond a radius (Benko et al., 2024).

High-Order and Hessian-Free Integrators

To improve the order of weak convergence, higher-order discretization schemes have been developed:

  • Runge–Kutta integrators: Strong order 1.5 Runge–Kutta LMC that is Hessian-free and requires only two gradient evaluations per step. WBtB_t9 error is dd0, with per-step cost reduced from previous three-gradient schemes (Yang et al., 8 May 2026).
  • Itô–Taylor/High-order integrators: Order-1.5 unadjusted LMC (HOLA) achieves Wdd1 rate dd2 with precise regularity in the third derivative of dd3 (Sabanis et al., 2018).
  • Splitting-based P-th order: For dd4, splitting and Taylor-based integrators achieve mixing time dd5 with dd6 for dd7, providing further acceleration in high dimensions (Dang et al., 24 Aug 2025).

Coordinate and Variance-Reduced LMC

Full-gradient evaluation can be costly in large dd8:

  • Random Coordinate LMC (RC-LMC): At each step, only one coordinate and its partial derivative are updated. For Hessian-Lipschitz potentials, the total computational cost is dd9, a square-root speedup over classical LMC (Ding et al., 2020).
  • Variance-reduced coordinate LMC (SVRG/SAGA): Variance reduction restores optimal iteration scaling while retaining per-iteration cost benefits; for underdamped LMC, the cost per sample is π\pi0 when using SVRG or SAGA (Ding et al., 2020).

Ensemble and Discrete LMC

  • Ensemble LMC: Particles infer gradients from neighbors to reduce true gradient computation. Instability is mitigated by constraints (CEnLMC), allowing for most steps to use finite-difference approximations (Ding et al., 2021).
  • DLMC (Discrete LMC): Wasserstein-gradient-flow construction extended to discrete sample spaces, yielding invariant Markov chains with parallel and time-uniform implementations and rigorous spectral gap controls (Sun et al., 2022).

4. Modifications for Nonconvexity, Metastability, and Heavy-tailed Exploration

Landscape Modification and Accelerated Mixing

In highly nonconvex or multimodal settings, classical LMC mixing time can be exponentially slow in the low-temperature regime due to large energy barriers. "Landscape-modified" LMC transforms the potential π\pi1 to π\pi2, reducing the effective energy barrier to π\pi3 and converting exponential-in-barrier Log-Sobolev constants to polynomial dependence. All functional-inequality-based analyses (Poincaré, LSI) and convergence rates are correspondingly improved (Choi et al., 2023).

Fractional LMC (Heavy-tailed Drivers)

Classical (Brownian-driven) LMC may be slow to escape local minima. Fractional LMC (FLMC) replaces the Brownian increment with a symmetric π\pi4-stable Lévy increment (π\pi5), leading to dynamics with heavy-tailed jumps (Şimşekli, 2017). These methods preserve π\pi6 as invariant (with suitable drift), facilitate barrier crossing in multimodal potentials, and achieve faster empirical mixing in double-well experiments. Finite-time error bounds in nonconvex setting show greater sensitivity to step-size choice; smaller π\pi7 is required due to heavier discretization bias (Nguyen et al., 2019).

Quasi-Monte Carlo LMC

Replacing Gaussian noise in LMC with low-discrepancy, completely uniformly distributed (CUD) sequences yields variance reduction in the estimator. Under smooth, strongly convex potentials, using LQMC achieves MSE π\pi8 with arbitrarily small π\pi9, substantially improving over the xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).0 scaling of ordinary MC (Liu, 2023).

Regime-Switching LMC

"Regime-switching" Langevin algorithms couple the discretized Langevin process with a finite-state Markov chain governing step-size, drift scaling, or friction. This introduces diversity into the sampler dynamics, yielding faster contraction rates and improved mixing time under valid design of CTMC parameters. Regime-switching underdamped LMC achieves xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).1 or even xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).2 iteration complexity (Wang et al., 31 Aug 2025).

5. Analysis Techniques and Functional Inequalities

Rigorous convergence results for LMC employ multiple techniques, often leveraging functional inequalities:

  • Contractivity in Wasserstein distance (xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).3): Strongly convex and log-smooth potentials yield exponential contraction via synchronous coupling (Li et al., 2021).
  • Mean-square and local error analysis: Sharper local error bounds (including order-2 weak error under third-derivative conditions) are key to proving optimal xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).4 scaling (Li et al., 2021).
  • Functional inequalities and metric convergence: Poincaré (variance), Latała–Oleszkiewicz, and (modified) Log-Sobolev inequalities control mixing and allow extension to weak smoothness or nonconvex settings (Chewi et al., 2021).
  • Girsanov change of measure: Employed for analyzing discrete-time and stochastic-gradient LMC via path-space divergences.
  • Higher-order Taylor expansions and splitting: Underpin the design and analysis of high-order integrators (Dang et al., 24 Aug 2025, Yang et al., 8 May 2026).
  • Nonlocal (fractional) analysis for FLMC: Stationarity proofs require careful use of Riesz fractional derivatives (Şimşekli, 2017).

6. Practical Implementation and Algorithm Selection

Practical deployment of LMC algorithms requires attention to dimension scaling, smoothness assumptions, computational cost per iteration, and robustness to nonconvexity or nonsmoothness:

  • Step-size selection: Typically xk+1=xkhf(xk)+2hξk+1,ξk+1N(0,Id).x_{k+1} = x_k - h \nabla f(x_k) + \sqrt{2h} \xi_{k+1}, \quad \xi_{k+1} \sim \mathcal{N}(0,I_d).5 to balance contractivity and discretization error (Li et al., 2021).
  • Gradient/hessian computation cost: High-order and variance-reduced methods reduce iteration count or per-iteration cost (Yang et al., 8 May 2026, Ding et al., 2020).
  • Proximal and nonsmooth variants: IPLA and Gaussian-smoothed methods extend applicability to challenging targets (super-quadratic, total variation-regularized models, etc.) (Benko et al., 2024, Chatterji et al., 2019).
  • Heavy-tailed and non-convex situations: FLMC and CG-LMC provide enhanced global exploration and robustness at the expense of tighter tuning (Şimşekli, 2017, Basu et al., 30 Jan 2025).
  • Discrete spaces and stochastic/mini-batch gradients: DLMC and SGLD-type methods offer scalable and parallelizable alternatives for large-scale and structured problems (Sun et al., 2022).

Algorithm selection should be guided by the interplay between target distribution properties, dimension, computational resources, and required statistical guarantees. Detailed parameterization and tuning guidance are available in cited works.

7. Recent Developments and Research Directions

Recent advances focus on:

Further open directions include: adaptive and preconditioned LMC, automatic selection and adaptation of step-size/sampler order, rigorous analysis of nonconvex and multimodal sampling, nonasymptotic guarantees in general metric spaces, and robust high-dimensional implementations for large-scale, structured models.


Key References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Langevin Monte Carlo (LMC) Algorithms.