Recursive Laplace Approximation

Updated 7 December 2025

Recursive Laplace approximation is an iterative method that refines the standard Laplace approach by constructing adaptive mixtures of Gaussian densities to capture multimodality and skewness.
It sequentially corrects approximation errors through residual optimization and updates mixture weights using nonnegative least squares to improve accuracy.
Empirical studies show that this methodology significantly enhances effective sample size and approximation precision in complex, high-dimensional Bayesian inference tasks.

Recursive Laplace approximation, also termed iterated (or iterLap) Laplace approximation, is a methodology for constructing adaptive mixtures of multivariate normal densities to approximate complex target probability densities $\pi(x)$ , particularly in Bayesian inference contexts. By iteratively correcting the limitations of the classical single-Gaussian Laplace approximation, such as its inability to capture multimodality or skewness, recursive Laplace builds a sequence of local approximations to the residual (“unexplained”) structure in the target, ultimately yielding a tractable and highly adaptive Gaussian mixture fit (Bornkamp, 2011).

1. The Standard Laplace Approximation and Problem Motivation

The Laplace approximation seeks an analytic form for integrals involving complicated target densities $\pi(x) \propto \pi(x)$ , $x \in \mathbb{R}^p$ , typically normalized by $Z = \int \pi(x) \, dx$ . For a mode $x^* = \operatorname{arg\,max}_x \log \pi(x)$ with negative definite Hessian $H^* = \nabla^2 \log \pi(x^*)$ , the local quadratic expansion gives

$\log \pi(x) \approx \log \pi(x^*) + \frac{1}{2}(x-x^*)^T H^* (x-x^*),$

so exponentiation results in the Gaussian kernel $\mathcal{N}(x \mid x^*, \Sigma)$ , with $\Sigma = -(H^*)^{-1}$ . The normalizing constant is approximated by $\hat Z = (2\pi)^{p/2} |\Sigma|^{1/2} \pi(x^*)$ . Despite its computational attractiveness, this single Gaussian $q_1(x)$ is inadequate for skewed or multimodal targets, motivating iterative corrections to better fit real-world Bayesian posteriors (Bornkamp, 2011).

2. Iterative Construction of the Recursive Laplace Mixture

Recursive Laplace approximation constructs a mixture of Gaussians via successive correction of the approximation error. At step $k$ , the existing mixture

$q_{k-1}(x) = \sum_{j=1}^{k-1} w_j \varphi_j(x), \qquad \varphi_j(x) = \mathcal{N}(x \mid \mu_j, \Sigma_j)$

is compared to $\pi(x)$ by the residual

$r_{k-1}(x) = \pi(x) - q_{k-1}(x),$

with regions where $r_{k-1}(x) > 0$ prioritized. A new component is formed by maximizing $\log r_{k-1}(x)$ (subject to $r_{k-1}(x) > 0$ ) to locate a new mode $\mu_k$ , computing the negative Hessian $H_k = \nabla^2 \log r_{k-1}(\mu_k)$ (assumed negative definite), and defining $\Sigma_k = -H_k^{-1}$ . The new Gaussian $\varphi_k(x)$ is appended, and all weights $\{w_j\}_{j=1}^k$ are recomputed by nonnegative least squares to minimize the discrete $L_2$ norm

$\sum_{i} \left[ \pi(x_i) - \sum_{j=1}^k w_j \varphi_j(x_i) \right]^2,$

over grid points $\{x_i\}$ sampled from the current mixture. This process repeats until no further negative definite residual Hessians can be found, or stopping criteria (either error tolerance or small relative change in normalization) are met. The final mixture is

$q_K(x) = \sum_{k=1}^K w_k \mathcal{N}(x \mid \mu_k, \Sigma_k).$

(Bornkamp, 2011).

3. Algorithmic Structure and Implementation Details

The method is formalized as follows:

Initialization: Identify global/local modes of $\pi(x)$ and compute corresponding Hessians and covariance matrices for initial mixture components. Generate a quasi-random grid (e.g., Sobol sequence) from each component and evaluate all densities.
Iteration:
- (A) Compute the residual $r_{k-1}(x)$ .
- (B) Locate promising starting points (e.g., high $\pi/q_{k-1}$ regions), optimize $\log r_{k-1}(x)$ to get new component location, and ensure the Hessian is negative definite.
- (C) Compute new covariance matrix and component.
- (D) Augment the grid with samples from the new component.
- (E) Recompute all weights via nonnegative quadratic programming.
- (F) Apply stopping criteria, based on grid error or relative normalization change.
Output: The mixture $q_K(x)$ (Bornkamp, 2011).

Recommended parameter settings include a grid size $n \approx 50 \cdot p^{1.25}$ for efficiency up to $p \approx 10$ –$20$, and the algorithm leverages least-squares (QP), vectorized Hessian calculations, and log-scale evaluations to mitigate underflow.

4. Mixture Structure, Weight Fitting, and Interpretative Remarks

The resulting approximation is an explicit sum of weighted normals:

$q_K(x) = \sum_{k=1}^K w_k \mathcal{N}(x \mid \mu_k, \Sigma_k),$

with weights determined by nonnegative least squares minimization $\sum_i [\pi(x_i) - q_K(x_i)]^2$ over the sampling grid, posed as a convex QP. This can be viewed as a “boosting”-like procedure, where each new component corrects the previous residual, and the fitting coefficients ensure that mass is allocated to the most poorly approximated portions of $\pi(x)$ (Bornkamp, 2011).

5. Theoretical Properties and Comparative Performance

Empirically, the recursive Laplace algorithm converges in a finite number of steps, capturing major modes and skewness in the target. Once the residual is small or nonpositive everywhere, or no negative definite Hessian can be discovered, iteration halts. While no closed-form error bounds exist, each step reduces the empirical $L_2$ discrepancy on the grid.

Relative to alternatives:

Single Gaussian Laplace approximations often yield poor effective sample size.
Adaptive importance sampling or mixture-based MCMC can match performance but typically at substantially higher cost in terms of $\pi$ -evaluations.
Variational methods minimize a divergence criterion rather than a direct $L_2$ error and may misrepresent tail probability mass (Bornkamp, 2011).

6. Computational Cost, Scalability, and Implementation Advice

Computationally, each iteration incurs costs from grid generation ( $O(n \cdot p)$ ), optimization for new means (typically $O(n \cdot p)$ gradient calls), Hessian computation ( $O(p^3)$ , potentially $O(p^2)$ when sparsity is exploited), and quadratic programming for weight updates ( $O(k^3)$ per iteration). Practical recommendations include:

Employing quasi-Monte Carlo (Sobol sequences) for grid sampling.
Operating in logarithmic space.
Restricting optimization starting points, e.g., via $k$ -means clustering on high $\pi/q$ regions.
Pre-scanning temperature-tempered residuals when many well-separated modes exist.
Imposing diagonal or low-rank covariance structure if inversion is a bottleneck (Bornkamp, 2011).

7. Applications and Empirical Diagnostics

Empirical studies demonstrate the strengths of recursive Laplace approximation across several test cases:

For a bivariate skew- $t$ density ( $df=5$ ), a single Gaussian achieves ESS $\approx 4\%$ , while iterLap with 9 components reaches ESS $\approx 65\%$ .
For a three-mode Gaussian mixture, iterLap (5 components) yields ESS of $99\%$ versus $2\%$ for standard Laplace.
For a 10-dimensional “banana” distribution, the method attains ESS $\approx 71\%$ (11 components; 19,000 $\pi$ -evaluations) compared to 5\% (single Laplace) or 100,000 evaluations for an adaptive IS competitor.
In a nonlinear regression scenario (NIST ENSO data, 168 observations, 11 parameters), iterLap-IS outperforms both componentwise adaptive Gibbs and optimally tuned Metropolis MCMC in effective sample size, estimation error, and multi-variate posterior fit, using roughly half the number of $\pi$ -evaluations as MCMC approaches.

Typical runtimes are tractable even for high-dimensional problems, e.g., $2.5$ seconds in R for a 10D artificial example (Bornkamp, 2011). The method is implemented in the R package iterLap available on CRAN.

References

Bornkamp, B. "Approximating Probability Densities by Iterated Laplace Approximations." Journal of Computational and Graphical Statistics, 20(4), 817–837 (2011) (Bornkamp, 2011).

Markdown Upgrade to Chat

References (1)

Approximating Probability Densities by Iterated Laplace Approximations (2011)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Laplace Approximation.