Papers
Topics
Authors
Recent
Search
2000 character limit reached

MALA: Gradient-Based MCMC in High Dimensions

Updated 11 March 2026
  • MALA is a gradient-based MCMC method that samples complex high-dimensional distributions using discretized Langevin dynamics corrected by a Metropolis–Hastings step.
  • The algorithm exhibits distinct scaling laws: burn-in costs scale as O(N^(1/2)) while stationary exploration scales as O(N^(1/3)), ensuring efficient convergence.
  • Practical implementation of MALA involves adaptive tuning of the step size to maintain an optimal acceptance rate near 0.574 across both non-stationary and stationary regimes.

The Metropolis-Adjusted Langevin Algorithm (MALA) is a gradient-based Markov Chain Monte Carlo (MCMC) method designed for sampling from complex, high-dimensional target distributions with Lebesgue density on RN\mathbb{R}^N. MALA constructs a reversible Markov chain with respect to the prescribed target and achieves ergodic sampling via a proposal mechanism derived from discretized Langevin dynamics, corrected with a Metropolis–Hastings accept–reject step. Contemporary analyses of MALA focus extensively on computational complexity as a function of dimension, particularly regarding its distinct dynamics in non-stationary (burn-in) versus stationary phases. The key high-dimensional asymptotics, established for non-product and product targets alike, delineate the regimes in which optimal scaling and cost are achieved, revealing universal scaling laws for burn-in and for stationary exploration (Kuntz et al., 2016).

1. Algorithmic Formulation of MALA

Given a target density πN\pi^N on RN\mathbb{R}^N, the MALA proposal is constructed as a single Euler–Maruyama (forward-Euler) discretization step of the overdamped Langevin SDE: y=x+h2logπN(x)+hZ,ZN(0,IN),y = x + \frac{h}{2} \nabla \log \pi^N(x) + \sqrt{h}\,Z, \qquad Z \sim N(0, I_N), where h>0h>0 serves as the proposal time step or variance. The associated Gaussian proposal density is

q(x,y)=1(4πh)N/2exp{14hyxh2logπN(x)2}.q(x, y) = \frac{1}{(4\pi h)^{N/2}} \exp \left\{ -\frac{1}{4h} \|y - x - \frac{h}{2} \nabla \log \pi^N(x)\|^2 \right\}.

The acceptance probability for the Metropolis–Hastings adjustment is defined by

α(x,y)=min{1,πN(y)q(y,x)πN(x)q(x,y)}=min{1,exp(QN(x,y))},\alpha(x, y) = \min\left\{1, \frac{\pi^N(y) q(y, x)}{\pi^N(x) q(x, y)}\right\} = \min\{1, \exp(Q^N(x, y))\},

with the explicit log-acceptance increment

QN(x,y)=logπN(y)logπN(x)14hyxh2logπN(x)2+14hxyh2logπN(y)2.Q^N(x, y) = \log \pi^N(y) - \log \pi^N(x) - \frac{1}{4h} \|y - x - \frac{h}{2}\nabla \log \pi^N(x)\|^2 + \frac{1}{4h} \|x - y - \frac{h}{2}\nabla \log \pi^N(y)\|^2.

This formulation guarantees reversibility of the Markov chain with respect to πN\pi^N, and hence, πN\pi^N is invariant (Kuntz et al., 2016).

2. Diffusion-Limit Theory and Non-Stationary Regime

For non-stationary initialization (the "burn-in" phase), MALA exhibits fundamentally different asymptotic behavior compared to its stationary regime. Consider the continuous-time, piecewise-linear interpolation

x(N)(t)=xk,N+(N1/2tk)(xk+1,Nxk,N),t[kN1/2,k+1N1/2),x^{(N)}(t) = x^{k, N} + (N^{1/2} t - k) \big(x^{k+1, N} - x^{k, N}\big), \qquad t \in \left[\frac{k}{N^{1/2}}, \frac{k+1}{N^{1/2}}\right),

and scale the proposal variance with dimension as h=N1/2h = \ell N^{-1/2}. As NN \to \infty, x(N)()x^{(N)}(\cdot) converges weakly (in suitable function-space topology) to the solution of an infinite-dimensional SDE coupled to a non-stationarity scalar ODE,

dx(t)=h(S(t))[x(t)+CΨ(x(t))]dt+2h(S(t))dW(t),x(0)=x0,dx(t) = -h(S(t))\, [x(t) + C \nabla \Psi(x(t))]\,dt + \sqrt{2 h(S(t))} dW(t), \qquad x(0) = x^0,

dS(t)=b(S(t))dt,whereb(s)=2(1s)h(s),h(s)=(1e22(s1)),dS(t) = b_\ell(S(t))\,dt, \quad \text{where} \quad b_\ell(s) = 2(1-s)h(s), \quad h(s) = \ell\left(1 \wedge e^{\frac{\ell^2}{2}(s-1)}\right),

S(0)=limN1Nj=1N(xj0)2λj2.S(0) = \lim_{N \to \infty} \frac{1}{N} \sum_{j=1}^N \frac{(x^0_j)^2}{\lambda_j^2}.

Here, W(t)W(t) is a cylindrical CsC_s-Brownian motion, and Ψ\Psi, CC encode the (possibly non-product) target (Kuntz et al., 2016). This coupled system captures the evolution of the "empirical squared norm" S(t)S(t), measuring deviation from stationarity; S(t)1S(t) \rightarrow 1 monotonically, after which the SDE reduces to the ergodic infinite-dimensional Langevin diffusion.

The key non-stationary result is that, with this scaling h=N1/2h = \ell N^{-1/2}, MALA requires O(N1/2)\mathcal{O}(N^{1/2}) iterations to traverse an O(1)\mathcal{O}(1) macroscopic time interval and to bring S(t)S(t) near equilibrium—the computational cost of burn-in thus scales as N1/2N^{1/2} [(Kuntz et al., 2016), Theorem 5.1].

3. Optimal Scaling and Cost in Stationary and Non-Stationary Regimes

The optimal cost analysis relies on asymptotic expansions of the Metropolis log-acceptance ratio:

  • Non-stationary (burn-in) phase: The dominant term in QNQ^N satisfies

I1N=h4(y2x2)22(Sk,N1),I^N_1 = -\,\frac{h}{4} (\|y\|^2 - \|x\|^2) \approx \frac{\ell^2}{2} (S^{k,N} - 1),

which is O(1)O(1) if and only if hN1/2h \sim \ell N^{-1/2}. All higher-order terms I2N,I3N=o(1)I^N_2, I^N_3 = o(1). Each MALA step increments time by O(N1/2)\mathcal{O}(N^{-1/2}), so O(N1/2)\mathcal{O}(N^{1/2}) steps are required for O(1)\mathcal{O}(1) macroscopic time—i.e., burn-in [(Kuntz et al., 2016), Lemmas 7.1–7.5].

  • Stationary regime: Once xkπNx^k \sim \pi^N, one has x2N\|x\|^2 \sim N and the leading Taylor expansion yields

I1N=h34x2+34I^N_1 = -\frac{h^3}{4}\|x\|^2 + \cdots \approx -\frac{\ell^3}{4}

provided h=N1/3h = \ell N^{-1/3}. The stationary regime thus admits a nondegenerate NN \to \infty limit for QNQ^N distributed as N(3/4,3/2)\mathcal{N}(-\ell^3/4, \ell^3/2), and sample acceptance αNE[1exp(N(3/4,3/2))]\alpha^N \to \mathbb{E}[1 \wedge \exp(\mathcal{N}(-\ell^3/4, \ell^3/2))]. The cost to make a macroscopic move in stationarity is O(N1/3)\mathcal{O}(N^{1/3}) steps [(Kuntz et al., 2016), Section 4].

This analysis yields universal cost laws for high-dimensional MALA: burn-in O(N1/2)\mathcal{O}(N^{1/2}), then stationary exploration O(N1/3)\mathcal{O}(N^{1/3}). These hold for both product and general non-product targets, provided mild spectral decay and Lipschitz conditions on Ψ\Psi (Kuntz et al., 2016).

4. Universality for Non-Product High-Dimensional Targets

The infinite-dimensional setting places MALA in the context of Hilbert space HH with reference Gaussian measure π0=N(0,C)\pi_0 = N(0, C). Targets are of the form

π(dx)exp{Ψ(x)}π0(dx),\pi(dx) \propto \exp\{-\Psi(x)\}\, \pi_0(dx),

with general nonlinear Ψ ⁣:HsR\Psi\colon H^s \to \mathbb{R}, s[0,κ1/2)s \in [0, \kappa - 1/2), and covariance eigenvalues λjjκ\lambda_j \sim j^{-\kappa}, κ>1/2\kappa>1/2. The Sobolev regularity, eigenvalue decay, and Lipschitz properties on Ψ\nabla \Psi (as HsHsH^s \to H^{-s} map) ensure regularity of the algorithm and the validity of diffusion-limit theorems. Finite-dimensional approximations on XN=span{ϕ1,...,ϕN}X^N = \text{span}\{\phi_1,...,\phi_N\} conduce to valid targets and theorems apply as long as the assumptions carry over (Kuntz et al., 2016).

This envelope encompasses important cases such as Bayesian inverse problems, nonparametric regression, and conditioned diffusions, extending MALA's high-dimensional theory far beyond product settings.

5. Practical Tuning: Acceptance Rate and Adaptive Schemes

Practical implications for high-dimensional MALA are direct:

  • Burn-in: Use hN1/2h \sim \ell N^{-1/2} until the observable S(t)1S(t) \to 1, i.e., until the chain approaches stationarity. This phase lasts O(N1/2)\mathcal{O}(N^{1/2}) steps.
  • Stationary exploration: Once S1S \approx 1, switch to hN1/3h \sim \ell N^{-1/3}; now each macroscopic move costs O(N1/3)\mathcal{O}(N^{1/3}) steps.
  • Optimal acceptance: In stationarity, maximize α\ell \alpha (where α=E[1eN(3/4,3/2)]\alpha = \mathbb{E}[1\wedge e^{\mathcal{N}(-\ell^3/4, \ell^3/2)}]) for the speed function; the theoretically optimal acceptance is α0.574\alpha^*\approx0.574 at 1.65\ell^*\approx1.65 [(Kuntz et al., 2016), Section 6].
  • Step-size adaptation: In practice, estimate S=EC1/2x2/NS = \mathbb{E}\|C^{-1/2}x\|^2/N and adapt hh accordingly to maintain desired acceptance.

These scaling results provide robust guidance for choosing step sizes and monitoring acceptance rates in high-dimensional applications where explicit diagnostics of convergence/mixing are difficult.

6. Regime Recognition and Transition

The sharp distinction between non-stationary and stationary regimes is prominent:

  • Scaling the step size too aggressively (hN1/2h \sim N^{-1/2}) in stationary phase leads to degeneracy (I1N0I_1^N \to 0 only if S=1S=1 exactly).
  • Using hN1/3h \sim N^{-1/3} in non-stationary phase leads to divergence in I1NN1/6(S1)I_1^N \sim N^{1/6}(S-1) if SS deviates appreciably from $1$.

Correctly identifying and adapting to these phases is essential for efficient high-dimensional MALA implementation; these scaling laws are universal for highly regular targets regardless of product structure (Kuntz et al., 2016).

7. Summary Table: Regime-Specific Scalings

Regime Step size hh Number of steps (to O(1)\mathcal{O}(1) move) Acceptance α\alpha Cost order
Non-stationary N1/2\ell N^{-1/2} O(N1/2)\mathcal{O}(N^{1/2}) α(S)1exp[22(S1)]\alpha(S) \approx 1\wedge \exp[\frac{\ell^2}{2}(S-1)] N1/2N^{1/2}
Stationary N1/3\ell N^{-1/3} O(N1/3)\mathcal{O}(N^{1/3}) α0.574\alpha \approx 0.574 N1/3N^{1/3}

The associated laws and optimal acceptance rates are derived from Taylor expansions and matched in stationarity by Gaussian approximation of the log-Metropolis increment (Kuntz et al., 2016).


References:

  • The foundational scaling and diffusion-limit results for non-stationary and stationary regimes appear in "Non-stationary phase of the MALA algorithm" (Kuntz et al., 2016). This work builds upon and significantly extends the analysis in (Pillai et al., 2011) and other infinite-dimensional scaling literatures.
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Metropolis Adjusted Langevin Algorithm (MALA).