Papers
Topics
Authors
Recent
2000 character limit reached

Recursive Laplace Approximation

Updated 7 December 2025
  • Recursive Laplace approximation is an iterative method that refines the standard Laplace approach by constructing adaptive mixtures of Gaussian densities to capture multimodality and skewness.
  • It sequentially corrects approximation errors through residual optimization and updates mixture weights using nonnegative least squares to improve accuracy.
  • Empirical studies show that this methodology significantly enhances effective sample size and approximation precision in complex, high-dimensional Bayesian inference tasks.

Recursive Laplace approximation, also termed iterated (or iterLap) Laplace approximation, is a methodology for constructing adaptive mixtures of multivariate normal densities to approximate complex target probability densities π(x)\pi(x), particularly in Bayesian inference contexts. By iteratively correcting the limitations of the classical single-Gaussian Laplace approximation, such as its inability to capture multimodality or skewness, recursive Laplace builds a sequence of local approximations to the residual (“unexplained”) structure in the target, ultimately yielding a tractable and highly adaptive Gaussian mixture fit (Bornkamp, 2011).

1. The Standard Laplace Approximation and Problem Motivation

The Laplace approximation seeks an analytic form for integrals involving complicated target densities π(x)π(x)\pi(x) \propto \pi(x), xRpx \in \mathbb{R}^p, typically normalized by Z=π(x)dxZ = \int \pi(x) \, dx. For a mode x=arg maxxlogπ(x)x^* = \operatorname{arg\,max}_x \log \pi(x) with negative definite Hessian H=2logπ(x)H^* = \nabla^2 \log \pi(x^*), the local quadratic expansion gives

logπ(x)logπ(x)+12(xx)TH(xx),\log \pi(x) \approx \log \pi(x^*) + \frac{1}{2}(x-x^*)^T H^* (x-x^*),

so exponentiation results in the Gaussian kernel N(xx,Σ)\mathcal{N}(x \mid x^*, \Sigma), with Σ=(H)1\Sigma = -(H^*)^{-1}. The normalizing constant is approximated by Z^=(2π)p/2Σ1/2π(x)\hat Z = (2\pi)^{p/2} |\Sigma|^{1/2} \pi(x^*). Despite its computational attractiveness, this single Gaussian q1(x)q_1(x) is inadequate for skewed or multimodal targets, motivating iterative corrections to better fit real-world Bayesian posteriors (Bornkamp, 2011).

2. Iterative Construction of the Recursive Laplace Mixture

Recursive Laplace approximation constructs a mixture of Gaussians via successive correction of the approximation error. At step kk, the existing mixture

qk1(x)=j=1k1wjφj(x),φj(x)=N(xμj,Σj)q_{k-1}(x) = \sum_{j=1}^{k-1} w_j \varphi_j(x), \qquad \varphi_j(x) = \mathcal{N}(x \mid \mu_j, \Sigma_j)

is compared to π(x)\pi(x) by the residual

rk1(x)=π(x)qk1(x),r_{k-1}(x) = \pi(x) - q_{k-1}(x),

with regions where rk1(x)>0r_{k-1}(x) > 0 prioritized. A new component is formed by maximizing logrk1(x)\log r_{k-1}(x) (subject to rk1(x)>0r_{k-1}(x) > 0) to locate a new mode μk\mu_k, computing the negative Hessian Hk=2logrk1(μk)H_k = \nabla^2 \log r_{k-1}(\mu_k) (assumed negative definite), and defining Σk=Hk1\Sigma_k = -H_k^{-1}. The new Gaussian φk(x)\varphi_k(x) is appended, and all weights {wj}j=1k\{w_j\}_{j=1}^k are recomputed by nonnegative least squares to minimize the discrete L2L_2 norm

i[π(xi)j=1kwjφj(xi)]2,\sum_{i} \left[ \pi(x_i) - \sum_{j=1}^k w_j \varphi_j(x_i) \right]^2,

over grid points {xi}\{x_i\} sampled from the current mixture. This process repeats until no further negative definite residual Hessians can be found, or stopping criteria (either error tolerance or small relative change in normalization) are met. The final mixture is

qK(x)=k=1KwkN(xμk,Σk).q_K(x) = \sum_{k=1}^K w_k \mathcal{N}(x \mid \mu_k, \Sigma_k).

(Bornkamp, 2011).

3. Algorithmic Structure and Implementation Details

The method is formalized as follows:

  • Initialization: Identify global/local modes of π(x)\pi(x) and compute corresponding Hessians and covariance matrices for initial mixture components. Generate a quasi-random grid (e.g., Sobol sequence) from each component and evaluate all densities.
  • Iteration:
    • (A) Compute the residual rk1(x)r_{k-1}(x).
    • (B) Locate promising starting points (e.g., high π/qk1\pi/q_{k-1} regions), optimize logrk1(x)\log r_{k-1}(x) to get new component location, and ensure the Hessian is negative definite.
    • (C) Compute new covariance matrix and component.
    • (D) Augment the grid with samples from the new component.
    • (E) Recompute all weights via nonnegative quadratic programming.
    • (F) Apply stopping criteria, based on grid error or relative normalization change.
  • Output: The mixture qK(x)q_K(x) (Bornkamp, 2011).

Recommended parameter settings include a grid size n50p1.25n \approx 50 \cdot p^{1.25} for efficiency up to p10p \approx 10–$20$, and the algorithm leverages least-squares (QP), vectorized Hessian calculations, and log-scale evaluations to mitigate underflow.

4. Mixture Structure, Weight Fitting, and Interpretative Remarks

The resulting approximation is an explicit sum of weighted normals:

qK(x)=k=1KwkN(xμk,Σk),q_K(x) = \sum_{k=1}^K w_k \mathcal{N}(x \mid \mu_k, \Sigma_k),

with weights determined by nonnegative least squares minimization i[π(xi)qK(xi)]2\sum_i [\pi(x_i) - q_K(x_i)]^2 over the sampling grid, posed as a convex QP. This can be viewed as a “boosting”-like procedure, where each new component corrects the previous residual, and the fitting coefficients ensure that mass is allocated to the most poorly approximated portions of π(x)\pi(x) (Bornkamp, 2011).

5. Theoretical Properties and Comparative Performance

Empirically, the recursive Laplace algorithm converges in a finite number of steps, capturing major modes and skewness in the target. Once the residual is small or nonpositive everywhere, or no negative definite Hessian can be discovered, iteration halts. While no closed-form error bounds exist, each step reduces the empirical L2L_2 discrepancy on the grid.

Relative to alternatives:

  • Single Gaussian Laplace approximations often yield poor effective sample size.
  • Adaptive importance sampling or mixture-based MCMC can match performance but typically at substantially higher cost in terms of π\pi-evaluations.
  • Variational methods minimize a divergence criterion rather than a direct L2L_2 error and may misrepresent tail probability mass (Bornkamp, 2011).

6. Computational Cost, Scalability, and Implementation Advice

Computationally, each iteration incurs costs from grid generation (O(np)O(n \cdot p)), optimization for new means (typically O(np)O(n \cdot p) gradient calls), Hessian computation (O(p3)O(p^3), potentially O(p2)O(p^2) when sparsity is exploited), and quadratic programming for weight updates (O(k3)O(k^3) per iteration). Practical recommendations include:

  • Employing quasi-Monte Carlo (Sobol sequences) for grid sampling.
  • Operating in logarithmic space.
  • Restricting optimization starting points, e.g., via kk-means clustering on high π/q\pi/q regions.
  • Pre-scanning temperature-tempered residuals when many well-separated modes exist.
  • Imposing diagonal or low-rank covariance structure if inversion is a bottleneck (Bornkamp, 2011).

7. Applications and Empirical Diagnostics

Empirical studies demonstrate the strengths of recursive Laplace approximation across several test cases:

  • For a bivariate skew-tt density (df=5df=5), a single Gaussian achieves ESS 4%\approx 4\%, while iterLap with 9 components reaches ESS 65%\approx 65\%.
  • For a three-mode Gaussian mixture, iterLap (5 components) yields ESS of 99%99\% versus 2%2\% for standard Laplace.
  • For a 10-dimensional “banana” distribution, the method attains ESS 71%\approx 71\% (11 components; 19,000 π\pi-evaluations) compared to 5\% (single Laplace) or 100,000 evaluations for an adaptive IS competitor.
  • In a nonlinear regression scenario (NIST ENSO data, 168 observations, 11 parameters), iterLap-IS outperforms both componentwise adaptive Gibbs and optimally tuned Metropolis MCMC in effective sample size, estimation error, and multi-variate posterior fit, using roughly half the number of π\pi-evaluations as MCMC approaches.

Typical runtimes are tractable even for high-dimensional problems, e.g., $2.5$ seconds in R for a 10D artificial example (Bornkamp, 2011). The method is implemented in the R package iterLap available on CRAN.

References

  • Bornkamp, B. "Approximating Probability Densities by Iterated Laplace Approximations." Journal of Computational and Graphical Statistics, 20(4), 817–837 (2011) (Bornkamp, 2011).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Recursive Laplace Approximation.