Papers
Topics
Authors
Recent
Search
2000 character limit reached

Expectation Propagation (EP)

Updated 3 April 2026
  • Expectation Propagation (EP) is an iterative algorithm for approximate inference that constructs tractable Gaussian approximations for complex, intractable posteriors.
  • It refines local approximations via cavity and tilted distributions, employing moment matching to update each site factor consistently.
  • EP connects to smoothed Newton’s methods and variational frameworks, offering scalable, accurate Bayesian inference with systematic perturbative corrections.

Expectation Propagation (EP) is an iterative algorithm for approximate inference which constructs and refines a tractable approximation to a complex posterior distribution by locally projecting intractable factors onto a computationally convenient exponential family. Historically, EP was introduced as a generalization of assumed-density filtering and belief propagation, providing broader applicability to hybrid discrete-continuous models and the flexibility to propagate correlated (non-factorized) approximations. The algorithm is notably well-suited for Bayesian inference and statistical learning, especially when the exact posterior is intractable due to non-conjugate or high-dimensional factors. EP’s empirical success, especially when using Gaussian approximations, has motivated intensive efforts to characterize its fixed points, convergence, theoretical exactness, and connections to variational and optimization-based frameworks.

1. Bayesian Approximate Inference and EP’s Construction

Consider a posterior density

p(x)exp(ψ(x)),p(x) \propto \exp\big(-\psi(x)\big),

where xRdx \in \mathbb{R}^d, ψ\psi is twice differentiable and typically a sum of local terms: ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x), with each factor fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x)). Direct computation of p(x)p(x) or its marginals is usually intractable.

EP introduces an approximating exponential-family density q(x)q(x), commonly chosen to be multivariate Gaussian:

q(x)i=1nqi(x),qi(x)=exp(12xBixrix),q(x) \propto \prod_{i=1}^n q_i(x), \quad q_i(x) = \exp\left(-\frac{1}{2} x^\top B_i x - r_i^\top x \right),

with q(x)q(x) being Gaussian. Each qi(x)q_i(x) approximates the original factor xRdx \in \mathbb{R}^d0.

To refine each site factor xRdx \in \mathbb{R}^d1, EP proceeds as follows:

  • Cavity Distribution: Remove xRdx \in \mathbb{R}^d2 from xRdx \in \mathbb{R}^d3, forming xRdx \in \mathbb{R}^d4.
  • Tilted Distribution: Reintroduce the true factor: xRdx \in \mathbb{R}^d5.
  • Moment Matching/Projection: Project xRdx \in \mathbb{R}^d6 back to the exponential family by matching its moments (typically mean and covariance for Gaussians).
  • Update: Set the new xRdx \in \mathbb{R}^d7 such that the moments of xRdx \in \mathbb{R}^d8 align with those of xRdx \in \mathbb{R}^d9.

This local-moment matching iteratively refines all ψ\psi0 until global consistency or convergence is achieved (Minka, 2013).

2. EP as Smoothed Newton’s Method and Variational Characterization

Each EP site update can be interpreted as a smoothed Newton (second-order) step on a “smoothed” energy landscape. Instead of performing Newton’s method directly on the raw energy ψ\psi1, EP operates on a convolution of ψ\psi2 with the current Gaussian approximation, amounting to

ψ\psi3

The expectation and covariance used in Newton's method are replaced by their averages under ψ\psi4, i.e., under the current Gaussian approximating measure. Each EP site update is effectively a Newton step along this locally averaged (smoothed) energy (Dehaene, 2016).

The fixed points of EP coincide with solutions to the problem of minimizing the "reverse" Kullback–Leibler divergence (i.e., ψ\psi5) within the chosen exponential family—specifically, the stationary equations for a Gaussian ψ\psi6:

  • ψ\psi7
  • ψ\psi8 where ψ\psi9 denotes the Hessian and ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)0 the covariance of ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)1.

This equivalence explains why, in convex and log-concave cases, EP converges to the unique solution minimizing ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)2 among Gaussians (Dehaene, 2016).

3. Corrections, Exactness, and Cumulant Expansions

Though EP is exact only under certain conditions (e.g., when the true posterior is in the approximating family or each tilted distribution is Gaussian), in general it neglects higher-order cumulants (third and above). This neglect can be systematically corrected via a perturbative expansion:

ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)3

where ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)4 is the normalization of ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)5, ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)6 is the estimate from the EP approximation, and the ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)7 are corrections involving cumulant differences between tilted and EP marginals. Notably, the first-order term vanishes and the leading correction (second order) is quadratic in site-pair cumulant differences, computable in ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)8 time for ψ(x)=i=1nϕi(x)\psi(x) = \sum_{i=1}^n \phi_i(x)9 sites (Opper et al., 2013). If all higher cumulants vanish, EP is exact. These corrections allow for quantifying the error of the EP approximation polynomially in fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))0.

4. Convergence and Stability Properties

Convergence guarantees are available in several settings:

  • For convex, log-concave posteriors, a full parallel EP sweep converges to a unique fixed point, viewed as block-coordinate descent on a concave lower bound (Dehaene, 2016).
  • In the large-data (large-fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))1) and large-cavity (high-precision) limit, both standard EP and averaged-EP (aEP) behave as Newton’s method, performing local quadratic updates on the log-posterior. Near modes of the posterior, EP’s fixed point converges to the canonical Gaussian approximation at fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))2 rate in total variation, under mild regularity and identifiability (Dehaene et al., 2015).
  • However, EP can be globally unstable in multimodal or poorly initialized scenarios, mirroring the known potential for Newton’s method to diverge far from a mode. In practice, using damping, Laplace initialization, or contraction diagnostics is essential for robust performance.

Because EP site updates involve Newton-like steps on smoothed objectives, they typically accelerate convergence in directions of gentle curvature and naturally recover the Laplace approximation in the limit of many weak factors (Dehaene, 2016).

5. Implementation, Numerical Considerations, and Variants

Efficient implementation of EP, especially in models with Gaussian-approximated posteriors, relies on the following:

  • Each site update requires computing the mean and covariance of the local hybrid distribution fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))3. For univariate or low-dimensional likelihoods, these can be computed via quadrature or local MCMC. The complexity per full sweep is fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))4 for fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))5 sites and fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))6-dimensional fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))7.
  • For very large models, sparse reparameterizations and distributed algorithms allow full EP inference to scale linearly with the number of sites, as well as support distributed memory architectures (see (Zhou et al., 2024) for mixed-effects regression and distributed EP frameworks).
  • Variants such as fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))8-EP implement alternative divergences by adjusting the smoothing kernel (for example, using fi(x)=exp(ϕi(x))f_i(x) = \exp(-\phi_i(x))9 kernels recovers reverse-KL-VB as p(x)p(x)0 and Hellinger-EP as p(x)p(x)1) (Dehaene, 2016).

6. Applications, Empirical Performance, and Extensions

EP has been applied to a broad range of models including generalized linear models, Gaussian process classification, sparse regression, and feature selection in high-dimensional classifiers. Empirically, EP produces accurate (often state-of-the-art) marginal likelihood and posterior moment estimates, outperforming Laplace, variational Bayes, and certain sampling approaches in terms of both accuracy and computational efficiency, especially in large-scale and non-conjugate settings (Minka, 2013, Dehaene, 2016, Opper et al., 2013).

Additionally, higher-order perturbative corrections can improve log-marginal likelihood accuracy by orders of magnitude, as validated in Gaussian process and Ising model test cases (Opper et al., 2013).

7. Summary and Theoretical Implications

Expectation Propagation provides a unifying framework that connects Newton-type optimization, variational inference, and message-passing. Rigorous analysis shows:

  • EP is precisely Newton’s method applied to a locally averaged energy landscape.
  • Its fixed points are the unique Gaussian minimizers of reverse-KL divergence to the posterior.
  • In high-data or high-cavity-limit regimes, EP is asymptotically exact.
  • Perturbative expansions yield systematic corrections and quantifiable error measures.
  • EP’s convergence and accuracy are explained by these unifying principles, demystifying both its empirical effectiveness and limitations (Dehaene, 2016, Opper et al., 2013, Dehaene et al., 2015).

References:

  • “Expectation Propagation performs a smoothed gradient descent” (Dehaene, 2016)
  • “Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models” (Opper et al., 2013)
  • “Expectation Propagation in the large-data limit” (Dehaene et al., 2015)
  • “Expectation Propagation for approximate Bayesian inference” (Minka, 2013)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Expectation Propagation (EP).