MALA: Gradient-Based MCMC in High Dimensions
- MALA is a gradient-based MCMC method that samples complex high-dimensional distributions using discretized Langevin dynamics corrected by a Metropolis–Hastings step.
- The algorithm exhibits distinct scaling laws: burn-in costs scale as O(N^(1/2)) while stationary exploration scales as O(N^(1/3)), ensuring efficient convergence.
- Practical implementation of MALA involves adaptive tuning of the step size to maintain an optimal acceptance rate near 0.574 across both non-stationary and stationary regimes.
The Metropolis-Adjusted Langevin Algorithm (MALA) is a gradient-based Markov Chain Monte Carlo (MCMC) method designed for sampling from complex, high-dimensional target distributions with Lebesgue density on . MALA constructs a reversible Markov chain with respect to the prescribed target and achieves ergodic sampling via a proposal mechanism derived from discretized Langevin dynamics, corrected with a Metropolis–Hastings accept–reject step. Contemporary analyses of MALA focus extensively on computational complexity as a function of dimension, particularly regarding its distinct dynamics in non-stationary (burn-in) versus stationary phases. The key high-dimensional asymptotics, established for non-product and product targets alike, delineate the regimes in which optimal scaling and cost are achieved, revealing universal scaling laws for burn-in and for stationary exploration (Kuntz et al., 2016).
1. Algorithmic Formulation of MALA
Given a target density on , the MALA proposal is constructed as a single Euler–Maruyama (forward-Euler) discretization step of the overdamped Langevin SDE: where serves as the proposal time step or variance. The associated Gaussian proposal density is
The acceptance probability for the Metropolis–Hastings adjustment is defined by
with the explicit log-acceptance increment
This formulation guarantees reversibility of the Markov chain with respect to , and hence, is invariant (Kuntz et al., 2016).
2. Diffusion-Limit Theory and Non-Stationary Regime
For non-stationary initialization (the "burn-in" phase), MALA exhibits fundamentally different asymptotic behavior compared to its stationary regime. Consider the continuous-time, piecewise-linear interpolation
and scale the proposal variance with dimension as . As , converges weakly (in suitable function-space topology) to the solution of an infinite-dimensional SDE coupled to a non-stationarity scalar ODE,
Here, is a cylindrical -Brownian motion, and , encode the (possibly non-product) target (Kuntz et al., 2016). This coupled system captures the evolution of the "empirical squared norm" , measuring deviation from stationarity; monotonically, after which the SDE reduces to the ergodic infinite-dimensional Langevin diffusion.
The key non-stationary result is that, with this scaling , MALA requires iterations to traverse an macroscopic time interval and to bring near equilibrium—the computational cost of burn-in thus scales as [(Kuntz et al., 2016), Theorem 5.1].
3. Optimal Scaling and Cost in Stationary and Non-Stationary Regimes
The optimal cost analysis relies on asymptotic expansions of the Metropolis log-acceptance ratio:
- Non-stationary (burn-in) phase: The dominant term in satisfies
which is if and only if . All higher-order terms . Each MALA step increments time by , so steps are required for macroscopic time—i.e., burn-in [(Kuntz et al., 2016), Lemmas 7.1–7.5].
- Stationary regime: Once , one has and the leading Taylor expansion yields
provided . The stationary regime thus admits a nondegenerate limit for distributed as , and sample acceptance . The cost to make a macroscopic move in stationarity is steps [(Kuntz et al., 2016), Section 4].
This analysis yields universal cost laws for high-dimensional MALA: burn-in , then stationary exploration . These hold for both product and general non-product targets, provided mild spectral decay and Lipschitz conditions on (Kuntz et al., 2016).
4. Universality for Non-Product High-Dimensional Targets
The infinite-dimensional setting places MALA in the context of Hilbert space with reference Gaussian measure . Targets are of the form
with general nonlinear , , and covariance eigenvalues , . The Sobolev regularity, eigenvalue decay, and Lipschitz properties on (as map) ensure regularity of the algorithm and the validity of diffusion-limit theorems. Finite-dimensional approximations on conduce to valid targets and theorems apply as long as the assumptions carry over (Kuntz et al., 2016).
This envelope encompasses important cases such as Bayesian inverse problems, nonparametric regression, and conditioned diffusions, extending MALA's high-dimensional theory far beyond product settings.
5. Practical Tuning: Acceptance Rate and Adaptive Schemes
Practical implications for high-dimensional MALA are direct:
- Burn-in: Use until the observable , i.e., until the chain approaches stationarity. This phase lasts steps.
- Stationary exploration: Once , switch to ; now each macroscopic move costs steps.
- Optimal acceptance: In stationarity, maximize (where ) for the speed function; the theoretically optimal acceptance is at [(Kuntz et al., 2016), Section 6].
- Step-size adaptation: In practice, estimate and adapt accordingly to maintain desired acceptance.
These scaling results provide robust guidance for choosing step sizes and monitoring acceptance rates in high-dimensional applications where explicit diagnostics of convergence/mixing are difficult.
6. Regime Recognition and Transition
The sharp distinction between non-stationary and stationary regimes is prominent:
- Scaling the step size too aggressively () in stationary phase leads to degeneracy ( only if exactly).
- Using in non-stationary phase leads to divergence in if deviates appreciably from $1$.
Correctly identifying and adapting to these phases is essential for efficient high-dimensional MALA implementation; these scaling laws are universal for highly regular targets regardless of product structure (Kuntz et al., 2016).
7. Summary Table: Regime-Specific Scalings
| Regime | Step size | Number of steps (to move) | Acceptance | Cost order |
|---|---|---|---|---|
| Non-stationary | ||||
| Stationary |
The associated laws and optimal acceptance rates are derived from Taylor expansions and matched in stationarity by Gaussian approximation of the log-Metropolis increment (Kuntz et al., 2016).
References:
- The foundational scaling and diffusion-limit results for non-stationary and stationary regimes appear in "Non-stationary phase of the MALA algorithm" (Kuntz et al., 2016). This work builds upon and significantly extends the analysis in (Pillai et al., 2011) and other infinite-dimensional scaling literatures.