Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Metropolis-Adjusted Langevin Algorithm (MALA)

Updated 12 July 2025
  • MALA is a gradient-informed MCMC method that leverages Euler discretization and a Metropolis–Hastings correction to sample from complex distributions.
  • It employs discretized Langevin dynamics, using potential function gradients to guide proposals toward regions of high posterior density.
  • The method's nonasymptotic analysis guarantees exponential convergence to equilibrium even under nonglobally Lipschitz conditions.

The Metropolis-Adjusted Langevin Algorithm (MALA) is a Markov Chain Monte Carlo (MCMC) method that incorporates gradient information from the target density to propose new states, correcting discretization bias via a Metropolis–Hastings accept–reject step. MALA is designed to sample efficiently from complex high-dimensional distributions, as it leverages the intuition that a discretized Langevin diffusion—a stochastic process with π as its invariant measure—guides proposals toward regions of high posterior density. MALA’s theoretical and practical properties have been extensively studied, particularly in the context of target distributions that arise from stochastic differential equations (SDEs), Bayesian inverse problems, and high-dimensional statistics.

1. Mathematical Formulation and Theoretical Setting

At its core, MALA can be viewed as a Metropolis–Hastings scheme with proposals based on an Euler discretization of an overdamped Langevin SDE: dYt=U(Yt)dt+2/βdWt,dY_t = -\nabla U(Y_t)\,dt + \sqrt{2/\beta}\,dW_t, where UU is the potential function (typically the negative log-density), β\beta is an inverse temperature parameter, and WtW_t is a standard Brownian motion. The corresponding invariant density is π(x)exp(βU(x))\pi(x) \propto \exp(-\beta U(x)).

MALA’s proposal at state xx is given by: X=xhU(x)+2β1hξ,ξN(0,I),X^* = x - h\nabla U(x) + \sqrt{2\beta^{-1} h}\,\xi, \qquad \xi \sim N(0, I), where h>0h > 0 is the time-step parameter. The proposed state XX^* is accepted with probability

αh(x,X)=1qh(X,x)π(X)qh(x,X)π(x)\alpha_h(x, X^*) = 1 \wedge \frac{q_h(X^*, x)\pi(X^*)}{q_h(x, X^*)\pi(x)}

where qh(,)q_h(\cdot, \cdot) is the transition density of the proposal kernel defined by this Euler scheme.

The chain thus constructed exactly preserves μ\mu (the SDE’s invariant measure) as its stationary distribution.

2. Nonasymptotic Mixing and Ergodicity

A central result of the referenced work (1008.3514) is the nonasymptotic quantification of MALA's rate of convergence to equilibrium, even when the drift U\nabla U is not globally Lipschitz—an important practical scenario. Despite the lack of a uniform spectral gap in such cases, the Metropolis–Hastings correction enables MALA to achieve strong ergodic properties by "patching" the instability in high-energy regions where discretization alone would lead to divergence.

The main theorem states that if PP denotes the MALA transition kernel over unit time and μ\mu is the invariant measure,

Pk(x,)μTVC1Φ(x)(ρk+eC2/h1/4),\left\|P^k(x, \cdot) - \mu\right\|_{\text{TV}} \leq C_1 \, \Phi(x) \left(\rho^k + e^{-C_2/h^{1/4}}\right),

where Φ(x)=exp(θU(x))\Phi(x) = \exp(\theta U(x)) is a Lyapunov function, ρ(0,1)\rho \in (0,1) is a contraction factor, hh is the time step, and C1,C2,θC_1, C_2, \theta are constants depending on UU, β\beta.

This bound demonstrates exponential decay in total variation distance with respect to kk (number of unit-time steps), up to an error term that is exponentially small in hh. The analysis crucially leverages Lyapunov techniques and a "patching argument": the state space is divided into a compact low-energy region (where uniform minorization conditions can be established) and its complement, with high-energy proposals controlled by rapidly decaying tails of μ\mu.

Assumptions on U(x)U(x) required by the analysis include:

  • Quadratic growth at infinity (U(x)C(1+x2)U(x) \geq C(1 + |x|^2));
  • An inequality controlling ΔU\Delta U in terms of U2|\nabla U|^2 and UU;
  • A one-sided Lipschitz condition on U\nabla U;
  • Higher derivatives of UU bounded in terms of U(x)U(x).

3. Practical and Algorithmic Implications

The derived nonasymptotic bounds provide direct quantitative guidance for simulation practice:

  • For any sufficiently small step size hh, the lack of uniform spectral gap is essentially negligible due to the exponentially small error exp(C2/h1/4)\exp(-C_2/h^{1/4}).
  • The algorithm remains robust even when the forward Euler proposals would diverge in the absence of a Metropolization step. The rejection mechanism "cleans up" these problematic proposals, maintaining both stability and correct invariant measure.
  • Because convergence to equilibrium is quantified in total variation distance and is nonasymptotic, users may choose hh so that the error term is below the desired numerical tolerance for simulation horizons of interest.
  • The bounds, however, are uniform only on initial conditions with U(x)<E0U(x) < E_0, justifying a focus on "localized" state spaces in calculations. In practice, high-energy regions contribute little due to the rapid decay of μ\mu.

MALA markedly outperforms unadjusted Euler-based schemes for SDEs with nonglobally Lipschitz drifts. While forward Euler discretization alone may be unstable or non-ergodic, the Metropolis–Hastings adjustment ensures ergodicity and preserves the invariant measure exactly. The critical distinction is that MALA’s error term is exponentially small in hh, while unadjusted Euler typically suffers an O(h)\mathcal{O}(h) error in the invariant distribution.

Compared to more general Metropolis–Hastings algorithms, MALA is more efficient on a broad class of SDE-inspired targets because it exploits gradient information through proposals informed by the underlying geometry of the target.

5. Implementation Considerations

When implementing MALA in systems with non-globally Lipschitz drifts, practitioners should:

  • Set the step size hh as small as computational resources permit, targeting exponential suppression of the finite-time error;
  • Monitor rare but plausible excursions into high-energy regions, recognizing that in practice these will be infrequently visited according to the tail of μ\mu;
  • Employ suitable Lyapunov drift diagnostics in simulations to detect potential failures of assumptions in extreme regimes;
  • Favor MALA over uncorrected Euler algorithms whenever the gradient of the log-density fails to be globally Lipschitz, as the Metropolis–Hastings correction ensures robust long-term convergence.

6. Summary and Broader Impact

The nonasymptotic mixing analysis of MALA demonstrates that, even in the absence of a spectral gap for the discretized dynamics, MALA inherits nearly all ergodic benefits of the underlying SDE thanks to the Metropolis–Hastings correction. This insight extends the practical guarantees of MALA to a wide class of applied problems—including statistical mechanics, Bayesian computation, and stochastic modeling—where only local regularity and tail conditions on the potential are available. Theoretical results, coupled with practical algorithmic strategies derived from Lyapunov and minorization arguments, ensure that MALA remains a robust and efficient sampler across challenging, high-dimensional settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)