Approximate Message Passing Algorithms

Updated 16 May 2026

Approximate Message Passing (AMP) algorithms are iterative methods that recover signals from noisy data by leveraging state evolution and Onsager corrections.
The methodology involves a scalar recursion that accurately tracks mean-squared error, making AMP effective for compressed sensing and large-scale linear regression.
AMP's universality across various random matrix models underpins its significance in fields like machine learning, statistical physics, and signal processing.

Approximate Message Passing (AMP) algorithms are a class of iterative algorithms designed for high-dimensional inference, particularly in problems where one seeks to reconstruct a signal from noisy linear measurements. AMP has rigorous performance characterizations in the large-system limit, can be analyzed even for complex priors and measurement ensembles, and has inspired generalizations applicable in modern machine learning, statistical physics, and signal processing. The central theoretical innovation is state evolution, a low-dimensional dynamical system that precisely tracks the mean-squared error and other observables of the iterates.

1. Formal Definition and Core Recursion

AMP algorithms solve high-dimensional inference problems such as linear regression or compressed sensing: $y = A \beta_0 + w$ where $y \in \mathbb R^n$ is observed, $A \in \mathbb R^{n \times N}$ is a random measurement matrix, $\beta_0 \in \mathbb R^N$ is the unknown signal and $w \in \mathbb R^n$ is noise. The standard AMP iteration (for separable denoising) is: $\begin{align*} x^{t+1} &= \eta_t(A^\top z^t + x^t) \ z^t &= y - A x^t + b_t z^{t-1} \end{align*}$ where $\eta_t$ is a component-wise (potentially nonlinear) denoiser, and the Onsager (divergence) term

$b_t = \frac{1}{\delta} \left\langle \eta'_{t-1}(A^\top z^{t-1} + x^{t-1}) \right\rangle, \quad \delta = n/N$

ensures asymptotic Gaussianity of effective observations $A^\top z^t + x^t$ (0911.4219, Feng et al., 2021, Rush et al., 2016, Zou et al., 2022). AMP is derived from the high-connectivity (dense graph) limit of belief propagation, using central-limit and Gaussian-approximation arguments to reduce the full message-passing system to this compact form.

The distinctive feature of AMP is that for suitable classes of $A$ (typically i.i.d. Gaussian or rotationally invariant matrices) and pseudo-Lipschitz denoisers, the entire error trajectory of $y \in \mathbb R^n$ 0 is described by state evolution—a scalar recursion for the per-coordinate effective noise variance (Rush et al., 2016, Chen et al., 2020, Zou et al., 2022, Feng et al., 2021).

2. State Evolution: Exact Macroscopic Prediction

State evolution (SE) is a deterministic dynamical system that tracks the mean-squared error and other empirical observables of the AMP iterates. For the canonical separable AMP, the SE recursion is: $y \in \mathbb R^n$ 1 where $y \in \mathbb R^n$ 2 is a typical coordinate of the signal prior and $y \in \mathbb R^n$ 3 is independent noise. For denoisers matched to $y \in \mathbb R^n$ 4, this scalar system predicts the AMP mean-squared error at each iteration with exponentially good accuracy, even for moderate $y \in \mathbb R^n$ 5 (Rush et al., 2016, Zou et al., 2022, Feng et al., 2021).

For non-separable (e.g., sliding-window) denoisers, as in Markov chain or Markov random field (MRF) priors, SE becomes a recursion on windowed (block) distributions: $y \in \mathbb R^n$ 6 where $y \in \mathbb R^n$ 7 is a block from the stationary law of the MRF and $y \in \mathbb R^n$ 8 is i.i.d. Gaussian (Ma et al., 2017, Ma et al., 2019).

Under mild pseudo-Lipschitz and regularity conditions on $y \in \mathbb R^n$ 9, and appropriate random matrix hypothesis on $A \in \mathbb R^{n \times N}$ 0, the empirical distribution of $A \in \mathbb R^{n \times N}$ 1 concentrates around the scalar law dictated by SE: $A \in \mathbb R^{n \times N}$ 2 (Rush et al., 2016, Ma et al., 2017, Ma et al., 2019). SE also accurately predicts phase transitions for sparse recovery and minimum mean-squared error trajectories.

3. Onsager Correction and Theoretical Origin

The Onsager term is essential in high dimensions: it removes the leading-order bias induced by recycling the same matrix $A \in \mathbb R^{n \times N}$ 3 in each iteration, nullifying self-interference (0911.4219, Feng et al., 2021, Zou et al., 2022). Lacking the Onsager term, simple iterative thresholding schemes (ISTA, projected gradient) exhibit much poorer performance and phase-transition boundaries (0911.4219).

The derivation is grounded in belief propagation (sum-product) on dense graphs, followed by a central-limit (Gaussian cavity) approximation. The AMP equations are algorithmic counterparts to the Thouless-Anderson-Palmer (TAP) equations from spin glass theory, with the Onsager reaction field matching the correction term from physics (0911.4219, Feng et al., 2021).

4. Non-Separable Denoisers and Dependencies

Standard AMP (scalar denoising) suffices for i.i.d. signals. For signals with structured dependencies—such as Markov chains, MRFs, or local spatial dependencies in images—a coordinate-wise denoiser is suboptimal. AMP can be extended to non-separable ("sliding-window") denoisers: $A \in \mathbb R^{n \times N}$ 4 where estimation at $A \in \mathbb R^{n \times N}$ 5 leverages a local window of $A \in \mathbb R^{n \times N}$ 6 variables. The state evolution generalizes to block distributions reflecting the stationary law of the Markov chain over the window, and the empirical MSE over the "middle" coordinates matches the SE prediction up to negligible edge effects (Ma et al., 2017, Ma et al., 2019).

Rigorous SE analysis for such non-separable settings requires concentration tools for sums of pseudo-Lipschitz functions of overlapping windows (Gaussian or Markovian)—enabled via martingale methods and spectral gap arguments for reversible, geometrically ergodic Markov chains (Ma et al., 2017, Ma et al., 2019).

5. Universality and Matrix Ensembles

The SE predictions for AMP—mean squared error evolution and empirical law—are universal across broad classes of random matrix ensembles. For symmetric Wigner, i.i.d. subgaussian, rotationally invariant, and even more structured or heavy-tailed ensembles, the limiting empirical distributions are identical to the Gaussian case (Chen et al., 2020, Wang et al., 2022, Dudeja et al., 2022).

Formally, if $A \in \mathbb R^{n \times N}$ 7 is suitably delocalized and has the correct first two moments, all Lipschitz ('pseudo-Lipschitz') functionals of the AMP iterates converge to Gaussian SE values. This universality justifies applying Gaussian-based SE formulas for optimal parameters (thresholds, step sizes) in non-Gaussian or deterministic contexts (Chen et al., 2020, Dudeja et al., 2022, Wang et al., 2022). This robustness underpins the practical power of AMP-type algorithms in compressed sensing and large-scale inference.