MALA: Gradient-Based MCMC in High Dimensions

Updated 11 March 2026

MALA is a gradient-based MCMC method that samples complex high-dimensional distributions using discretized Langevin dynamics corrected by a Metropolis–Hastings step.
The algorithm exhibits distinct scaling laws: burn-in costs scale as O(N^(1/2)) while stationary exploration scales as O(N^(1/3)), ensuring efficient convergence.
Practical implementation of MALA involves adaptive tuning of the step size to maintain an optimal acceptance rate near 0.574 across both non-stationary and stationary regimes.

The Metropolis-Adjusted Langevin Algorithm (MALA) is a gradient-based Markov Chain Monte Carlo (MCMC) method designed for sampling from complex, high-dimensional target distributions with Lebesgue density on $\mathbb{R}^N$ . MALA constructs a reversible Markov chain with respect to the prescribed target and achieves ergodic sampling via a proposal mechanism derived from discretized Langevin dynamics, corrected with a Metropolis–Hastings accept–reject step. Contemporary analyses of MALA focus extensively on computational complexity as a function of dimension, particularly regarding its distinct dynamics in non-stationary (burn-in) versus stationary phases. The key high-dimensional asymptotics, established for non-product and product targets alike, delineate the regimes in which optimal scaling and cost are achieved, revealing universal scaling laws for burn-in and for stationary exploration (Kuntz et al., 2016).

1. Algorithmic Formulation of MALA

Given a target density $\pi^N$ on $\mathbb{R}^N$ , the MALA proposal is constructed as a single Euler–Maruyama (forward-Euler) discretization step of the overdamped Langevin SDE: $y = x + \frac{h}{2} \nabla \log \pi^N(x) + \sqrt{h}\,Z, \qquad Z \sim N(0, I_N),$ where $h>0$ serves as the proposal time step or variance. The associated Gaussian proposal density is

$q(x, y) = \frac{1}{(4\pi h)^{N/2}} \exp \left\{ -\frac{1}{4h} \|y - x - \frac{h}{2} \nabla \log \pi^N(x)\|^2 \right\}.$

The acceptance probability for the Metropolis–Hastings adjustment is defined by

$\alpha(x, y) = \min\left\{1, \frac{\pi^N(y) q(y, x)}{\pi^N(x) q(x, y)}\right\} = \min\{1, \exp(Q^N(x, y))\},$

with the explicit log-acceptance increment

$Q^N(x, y) = \log \pi^N(y) - \log \pi^N(x) - \frac{1}{4h} \|y - x - \frac{h}{2}\nabla \log \pi^N(x)\|^2 + \frac{1}{4h} \|x - y - \frac{h}{2}\nabla \log \pi^N(y)\|^2.$

This formulation guarantees reversibility of the Markov chain with respect to $\pi^N$ , and hence, $\pi^N$ is invariant (Kuntz et al., 2016).

2. Diffusion-Limit Theory and Non-Stationary Regime

For non-stationary initialization (the "burn-in" phase), MALA exhibits fundamentally different asymptotic behavior compared to its stationary regime. Consider the continuous-time, piecewise-linear interpolation

$x^{(N)}(t) = x^{k, N} + (N^{1/2} t - k) \big(x^{k+1, N} - x^{k, N}\big), \qquad t \in \left[\frac{k}{N^{1/2}}, \frac{k+1}{N^{1/2}}\right),$

and scale the proposal variance with dimension as $h = \ell N^{-1/2}$ . As $N \to \infty$ , $x^{(N)}(\cdot)$ converges weakly (in suitable function-space topology) to the solution of an infinite-dimensional SDE coupled to a non-stationarity scalar ODE,

$dx(t) = -h(S(t))\, [x(t) + C \nabla \Psi(x(t))]\,dt + \sqrt{2 h(S(t))} dW(t), \qquad x(0) = x^0,$

$dS(t) = b_\ell(S(t))\,dt, \quad \text{where} \quad b_\ell(s) = 2(1-s)h(s), \quad h(s) = \ell\left(1 \wedge e^{\frac{\ell^2}{2}(s-1)}\right),$

$S(0) = \lim_{N \to \infty} \frac{1}{N} \sum_{j=1}^N \frac{(x^0_j)^2}{\lambda_j^2}.$

Here, $W(t)$ is a cylindrical $C_s$ -Brownian motion, and $\Psi$ , $C$ encode the (possibly non-product) target (Kuntz et al., 2016). This coupled system captures the evolution of the "empirical squared norm" $S(t)$ , measuring deviation from stationarity; $S(t) \rightarrow 1$ monotonically, after which the SDE reduces to the ergodic infinite-dimensional Langevin diffusion.

The key non-stationary result is that, with this scaling $h = \ell N^{-1/2}$ , MALA requires $\mathcal{O}(N^{1/2})$ iterations to traverse an $\mathcal{O}(1)$ macroscopic time interval and to bring $S(t)$ near equilibrium—the computational cost of burn-in thus scales as $N^{1/2}$ [(Kuntz et al., 2016), Theorem 5.1].

3. Optimal Scaling and Cost in Stationary and Non-Stationary Regimes

The optimal cost analysis relies on asymptotic expansions of the Metropolis log-acceptance ratio:

Non-stationary (burn-in) phase: The dominant term in $Q^N$ satisfies

$I^N_1 = -\,\frac{h}{4} (\|y\|^2 - \|x\|^2) \approx \frac{\ell^2}{2} (S^{k,N} - 1),$

which is $O(1)$ if and only if $h \sim \ell N^{-1/2}$ . All higher-order terms $I^N_2, I^N_3 = o(1)$ . Each MALA step increments time by $\mathcal{O}(N^{-1/2})$ , so $\mathcal{O}(N^{1/2})$ steps are required for $\mathcal{O}(1)$ macroscopic time—i.e., burn-in [(Kuntz et al., 2016), Lemmas 7.1–7.5].

Stationary regime: Once $x^k \sim \pi^N$ , one has $\|x\|^2 \sim N$ and the leading Taylor expansion yields

$I^N_1 = -\frac{h^3}{4}\|x\|^2 + \cdots \approx -\frac{\ell^3}{4}$

provided $h = \ell N^{-1/3}$ . The stationary regime thus admits a nondegenerate $N \to \infty$ limit for $Q^N$ distributed as $\mathcal{N}(-\ell^3/4, \ell^3/2)$ , and sample acceptance $\alpha^N \to \mathbb{E}[1 \wedge \exp(\mathcal{N}(-\ell^3/4, \ell^3/2))]$ . The cost to make a macroscopic move in stationarity is $\mathcal{O}(N^{1/3})$ steps [(Kuntz et al., 2016), Section 4].

This analysis yields universal cost laws for high-dimensional MALA: burn-in $\mathcal{O}(N^{1/2})$ , then stationary exploration $\mathcal{O}(N^{1/3})$ . These hold for both product and general non-product targets, provided mild spectral decay and Lipschitz conditions on $\Psi$ (Kuntz et al., 2016).

4. Universality for Non-Product High-Dimensional Targets

The infinite-dimensional setting places MALA in the context of Hilbert space $H$ with reference Gaussian measure $\pi_0 = N(0, C)$ . Targets are of the form

$\pi(dx) \propto \exp\{-\Psi(x)\}\, \pi_0(dx),$

with general nonlinear $\Psi\colon H^s \to \mathbb{R}$ , $s \in [0, \kappa - 1/2)$ , and covariance eigenvalues $\lambda_j \sim j^{-\kappa}$ , $\kappa>1/2$ . The Sobolev regularity, eigenvalue decay, and Lipschitz properties on $\nabla \Psi$ (as $H^s \to H^{-s}$ map) ensure regularity of the algorithm and the validity of diffusion-limit theorems. Finite-dimensional approximations on $X^N = \text{span}\{\phi_1,...,\phi_N\}$ conduce to valid targets and theorems apply as long as the assumptions carry over (Kuntz et al., 2016).

This envelope encompasses important cases such as Bayesian inverse problems, nonparametric regression, and conditioned diffusions, extending MALA's high-dimensional theory far beyond product settings.

5. Practical Tuning: Acceptance Rate and Adaptive Schemes

Practical implications for high-dimensional MALA are direct:

Burn-in: Use $h \sim \ell N^{-1/2}$ until the observable $S(t) \to 1$ , i.e., until the chain approaches stationarity. This phase lasts $\mathcal{O}(N^{1/2})$ steps.
Stationary exploration: Once $S \approx 1$ , switch to $h \sim \ell N^{-1/3}$ ; now each macroscopic move costs $\mathcal{O}(N^{1/3})$ steps.
Optimal acceptance: In stationarity, maximize $\ell \alpha$ (where $\alpha = \mathbb{E}[1\wedge e^{\mathcal{N}(-\ell^3/4, \ell^3/2)}]$ ) for the speed function; the theoretically optimal acceptance is $\alpha^*\approx0.574$ at $\ell^*\approx1.65$ [(Kuntz et al., 2016), Section 6].
Step-size adaptation: In practice, estimate $S = \mathbb{E}\|C^{-1/2}x\|^2/N$ and adapt $h$ accordingly to maintain desired acceptance.

These scaling results provide robust guidance for choosing step sizes and monitoring acceptance rates in high-dimensional applications where explicit diagnostics of convergence/mixing are difficult.

6. Regime Recognition and Transition

The sharp distinction between non-stationary and stationary regimes is prominent:

Scaling the step size too aggressively ( $h \sim N^{-1/2}$ ) in stationary phase leads to degeneracy ( $I_1^N \to 0$ only if $S=1$ exactly).
Using $h \sim N^{-1/3}$ in non-stationary phase leads to divergence in $I_1^N \sim N^{1/6}(S-1)$ if $S$ deviates appreciably from $1$.

Correctly identifying and adapting to these phases is essential for efficient high-dimensional MALA implementation; these scaling laws are universal for highly regular targets regardless of product structure (Kuntz et al., 2016).

7. Summary Table: Regime-Specific Scalings

Regime	Step size $h$	Number of steps (to $\mathcal{O}(1)$ move)	Acceptance $\alpha$	Cost order
Non-stationary	$\ell N^{-1/2}$	$\mathcal{O}(N^{1/2})$	$\alpha(S) \approx 1\wedge \exp[\frac{\ell^2}{2}(S-1)]$	$N^{1/2}$
Stationary	$\ell N^{-1/3}$	$\mathcal{O}(N^{1/3})$	$\alpha \approx 0.574$	$N^{1/3}$

The associated laws and optimal acceptance rates are derived from Taylor expansions and matched in stationarity by Gaussian approximation of the log-Metropolis increment (Kuntz et al., 2016).

References:

The foundational scaling and diffusion-limit results for non-stationary and stationary regimes appear in "Non-stationary phase of the MALA algorithm" (Kuntz et al., 2016). This work builds upon and significantly extends the analysis in (Pillai et al., 2011) and other infinite-dimensional scaling literatures.

Markdown Report Issue Upgrade to Chat

References (2)

Non-stationary phase of the MALA algorithm (2016)

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions (2011)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Metropolis Adjusted Langevin Algorithm (MALA).

MALA: Gradient-Based MCMC in High Dimensions

1. Algorithmic Formulation of MALA

2. Diffusion-Limit Theory and Non-Stationary Regime

3. Optimal Scaling and Cost in Stationary and Non-Stationary Regimes

4. Universality for Non-Product High-Dimensional Targets

5. Practical Tuning: Acceptance Rate and Adaptive Schemes

6. Regime Recognition and Transition

7. Summary Table: Regime-Specific Scalings

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MALA: Gradient-Based MCMC in High Dimensions

1. Algorithmic Formulation of MALA

2. Diffusion-Limit Theory and Non-Stationary Regime

3. Optimal Scaling and Cost in Stationary and Non-Stationary Regimes

4. Universality for Non-Product High-Dimensional Targets

5. Practical Tuning: Acceptance Rate and Adaptive Schemes

6. Regime Recognition and Transition

7. Summary Table: Regime-Specific Scalings

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research