Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

91 tokens/sec

GPT-4o

12 tokens/sec

Gemini 2.5 Pro Pro

o3 Pro

5 tokens/sec

GPT-4.1 Pro

15 tokens/sec

DeepSeek R1 via Azure Pro

33 tokens/sec

Gemini 2.5 Flash Deprecated

12 tokens/sec

2000 character limit reached

Metropolis-Adjusted Langevin Algorithm (MALA)

Updated 12 July 2025

MALA is a gradient-informed MCMC method that leverages Euler discretization and a Metropolis–Hastings correction to sample from complex distributions.
It employs discretized Langevin dynamics, using potential function gradients to guide proposals toward regions of high posterior density.
The method's nonasymptotic analysis guarantees exponential convergence to equilibrium even under nonglobally Lipschitz conditions.

The Metropolis-Adjusted Langevin Algorithm (MALA) is a Markov Chain Monte Carlo (MCMC) method that incorporates gradient information from the target density to propose new states, correcting discretization bias via a Metropolis–Hastings accept–reject step. MALA is designed to sample efficiently from complex high-dimensional distributions, as it leverages the intuition that a discretized Langevin diffusion—a stochastic process with π as its invariant measure—guides proposals toward regions of high posterior density. MALA’s theoretical and practical properties have been extensively studied, particularly in the context of target distributions that arise from stochastic differential equations (SDEs), Bayesian inverse problems, and high-dimensional statistics.

1. Mathematical Formulation and Theoretical Setting

At its core, MALA can be viewed as a Metropolis–Hastings scheme with proposals based on an Euler discretization of an overdamped Langevin SDE: $dY_t = -\nabla U(Y_t)\,dt + \sqrt{2/\beta}\,dW_t,$ where $U$ is the potential function (typically the negative log-density), $\beta$ is an inverse temperature parameter, and $W_t$ is a standard Brownian motion. The corresponding invariant density is $\pi(x) \propto \exp(-\beta U(x))$ .

MALA’s proposal at state $x$ is given by: $X^* = x - h\nabla U(x) + \sqrt{2\beta^{-1} h}\,\xi, \qquad \xi \sim N(0, I),$ where $h > 0$ is the time-step parameter. The proposed state $X^*$ is accepted with probability

$\alpha_h(x, X^*) = 1 \wedge \frac{q_h(X^*, x)\pi(X^*)}{q_h(x, X^*)\pi(x)}$

where $q_h(\cdot, \cdot)$ is the transition density of the proposal kernel defined by this Euler scheme.

The chain thus constructed exactly preserves $\mu$ (the SDE’s invariant measure) as its stationary distribution.

2. Nonasymptotic Mixing and Ergodicity

A central result of the referenced work (1008.3514) is the nonasymptotic quantification of MALA's rate of convergence to equilibrium, even when the drift $\nabla U$ is not globally Lipschitz—an important practical scenario. Despite the lack of a uniform spectral gap in such cases, the Metropolis–Hastings correction enables MALA to achieve strong ergodic properties by "patching" the instability in high-energy regions where discretization alone would lead to divergence.

The main theorem states that if $P$ denotes the MALA transition kernel over unit time and $\mu$ is the invariant measure,

$\left\|P^k(x, \cdot) - \mu\right\|_{\text{TV}} \leq C_1 \, \Phi(x) \left(\rho^k + e^{-C_2/h^{1/4}}\right),$

where $\Phi(x) = \exp(\theta U(x))$ is a Lyapunov function, $\rho \in (0,1)$ is a contraction factor, $h$ is the time step, and $C_1, C_2, \theta$ are constants depending on $U$ , $\beta$ .

This bound demonstrates exponential decay in total variation distance with respect to $k$ (number of unit-time steps), up to an error term that is exponentially small in $h$ . The analysis crucially leverages Lyapunov techniques and a "patching argument": the state space is divided into a compact low-energy region (where uniform minorization conditions can be established) and its complement, with high-energy proposals controlled by rapidly decaying tails of $\mu$ .

Assumptions on $U(x)$ required by the analysis include:

Quadratic growth at infinity ( $U(x) \geq C(1 + |x|^2)$ );
An inequality controlling $\Delta U$ in terms of $|\nabla U|^2$ and $U$ ;
A one-sided Lipschitz condition on $\nabla U$ ;
Higher derivatives of $U$ bounded in terms of $U(x)$ .

3. Practical and Algorithmic Implications

The derived nonasymptotic bounds provide direct quantitative guidance for simulation practice:

For any sufficiently small step size $h$ , the lack of uniform spectral gap is essentially negligible due to the exponentially small error $\exp(-C_2/h^{1/4})$ .
The algorithm remains robust even when the forward Euler proposals would diverge in the absence of a Metropolization step. The rejection mechanism "cleans up" these problematic proposals, maintaining both stability and correct invariant measure.
Because convergence to equilibrium is quantified in total variation distance and is nonasymptotic, users may choose $h$ so that the error term is below the desired numerical tolerance for simulation horizons of interest.
The bounds, however, are uniform only on initial conditions with $U(x) < E_0$ , justifying a focus on "localized" state spaces in calculations. In practice, high-energy regions contribute little due to the rapid decay of $\mu$ .

MALA markedly outperforms unadjusted Euler-based schemes for SDEs with nonglobally Lipschitz drifts. While forward Euler discretization alone may be unstable or non-ergodic, the Metropolis–Hastings adjustment ensures ergodicity and preserves the invariant measure exactly. The critical distinction is that MALA’s error term is exponentially small in $h$ , while unadjusted Euler typically suffers an $\mathcal{O}(h)$ error in the invariant distribution.

Compared to more general Metropolis–Hastings algorithms, MALA is more efficient on a broad class of SDE-inspired targets because it exploits gradient information through proposals informed by the underlying geometry of the target.

5. Implementation Considerations

When implementing MALA in systems with non-globally Lipschitz drifts, practitioners should:

Set the step size $h$ as small as computational resources permit, targeting exponential suppression of the finite-time error;
Monitor rare but plausible excursions into high-energy regions, recognizing that in practice these will be infrequently visited according to the tail of $\mu$ ;
Employ suitable Lyapunov drift diagnostics in simulations to detect potential failures of assumptions in extreme regimes;
Favor MALA over uncorrected Euler algorithms whenever the gradient of the log-density fails to be globally Lipschitz, as the Metropolis–Hastings correction ensures robust long-term convergence.

6. Summary and Broader Impact

The nonasymptotic mixing analysis of MALA demonstrates that, even in the absence of a spectral gap for the discretized dynamics, MALA inherits nearly all ergodic benefits of the underlying SDE thanks to the Metropolis–Hastings correction. This insight extends the practical guarantees of MALA to a wide class of applied problems—including statistical mechanics, Bayesian computation, and stochastic modeling—where only local regularity and tail conditions on the potential are available. Theoretical results, coupled with practical algorithmic strategies derived from Lyapunov and minorization arguments, ensure that MALA remains a robust and efficient sampler across challenging, high-dimensional settings.

PDF Markdown Chat (Upgrade)

References (1)

Non-asymptotic mixing of the MALA algorithm (2010)