Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quadratic Laplace Approximation

Updated 10 February 2026
  • Quadratic Laplace Approximation (QLA) is a refined technique for approximating complex Bayesian posteriors using local Taylor expansions and dominant curvature information.
  • QLA employs efficient methods like Pearlmutter’s trick and power iteration to compute Hessian-vector products, yielding scalable, rank-one curvature surrogates.
  • Applications of QLA in deep learning, inverse problems, and latent-variable models demonstrate improved uncertainty quantification and computational efficiency.

The Quadratic Laplace Approximation (QLA) is a generalization and refinement of the Laplace approximation, a cornerstone technique in Bayesian inference for approximating complex posterior distributions by Gaussian surrogates. QLA systematically leverages second-order (quadratic) information from the log-posterior or log-likelihood to improve the fidelity and computational efficiency of the inference across a range of models, including hierarchical latent-variable frameworks, deep neural networks, and inverse problems. Multiple algorithmic instantiations exist, unified by the use of local Taylor expansions about a mode (often MAP), efficient Hessian computations (e.g., via Pearlmutter’s trick or power iteration), and provable approximation properties.

1. Foundational Principles and Standard Laplace Approximation

The Laplace approximation constructs a Gaussian surrogate to a posterior or unnormalized target density p(θX)exp((θ))p(\theta\mid X) \propto \exp(\ell(\theta)), centering at the mode θ^=argmaxθ(θ)\hat\theta = \arg\max_\theta \ell(\theta) and using the negative inverse Hessian for the covariance: p(θX)N(θ^,Σ),Σ=[2(θ^)]1.p(\theta\mid X) \approx \mathcal{N}(\hat\theta,\, \Sigma),\qquad \Sigma = -[\nabla^2 \ell(\hat\theta)]^{-1}. This requires (θ)\ell(\theta) to be twice differentiable near θ^\hat\theta, with 2(θ^)-\nabla^2 \ell(\hat\theta) positive definite, and the posterior mass to be dominated by the quadratic region. The Laplace approximation offers rapid convergence of integrals as data grows and the posterior concentrates, with Hellinger distance scaling as O(n1/2)O(n^{-1/2}) for sample size nn under regularity conditions (Schillings et al., 2019).

2. Algorithmic Implementations: EM, Pearlmutter’s Trick, and Power Iteration

EM and Pearlmutter’s Trick

In models with latent variables and EM structure, let Q(θθold)Q(\theta \mid \theta^{\rm old}) denote the EM auxiliary: Q(θθold)=EHp(HX,θold)[logp(X,Hθ)]+logp(θ).Q(\theta\mid\theta^{\rm old}) = \mathbb{E}_{H\sim p(H\mid X,\theta^{\rm old})}[\log p(X,H\mid\theta)] + \log p(\theta). At a fixed point θ^\hat\theta, θQ(θ^θ^)=0\nabla_\theta Q(\hat\theta \mid \hat\theta) = 0 and the gradient and Hessian coincide with those of the log-posterior. The Pearlmutter trick efficiently computes Hessian-vector products needed for quadratic approximations: 2(θ)v=limε0(θ+εv)(θ)ε.\nabla^2\ell(\theta)\cdot v = \lim_{\varepsilon\to 0} \frac{\nabla\ell(\theta+\varepsilon v) - \nabla\ell(\theta)}{\varepsilon}. This supports scalable QLA construction without explicitly materializing the Hessian, compatible with modern autodiff frameworks (Brümmer, 2014).

Power Iteration for Dominant Curvature

In high-dimensional regimes, notably for DNNs, QLA uses a low-rank surrogate to the per-data-point Hessian An=θ2logp(ynf(xn,θ))A_n = -\nabla^2_\theta \log p(y_n|f(x_n, \theta)): Anz^nz^n,z^n=dominant eigenvector of An.A_n \approx \hat z_n \hat z_n^\top, \quad \hat z_n = \text{dominant eigenvector of } A_n. Power iteration starting from the Jacobian Jn=θf(xn,θ^)J_n = \nabla_\theta f(x_n, \hat\theta) converges rapidly in practice (typically K10K\sim 10 iterations), yielding a rank-one curvature summary (Jiménez et al., 3 Feb 2026).

3. Generalizations: Iterated Laplace and Mixture Approaches

Standard Laplace approximation is limited to unimodal, approximately Gaussian posteriors. The iterated Laplace (iterLap) framework iteratively fits a sequence of local quadratic expansions to the residual of the previous approximation, accumulating a mixture-of-Gaussians: qm(θ)=i=1mwiN(μi,Σi)q_m(\theta) = \sum_{i=1}^m w_i\,\mathcal{N}(\mu_i, \Sigma_i) Each new component targets the region where the previous approximation is inadequate, with mode-finding and Hessian computation at each step. Under mild convergence conditions, iterated QLA yields high-fidelity approximations for multimodal or skewed posteriors (Mai et al., 2015).

Population-based strategies further improve robustness in sequential inference for state-space models, including SIG-RS and SIG-RSRP, by maintaining several parallel QLA-based filters and resampling based on predictive performance, with tempering of overconfident variances (Mai et al., 2015).

4. Theoretical Guarantees and Approximation Error Quantification

QLA possesses explicit convergence guarantees in strong data or low-noise regimes. For posteriors of the form μn(dx)exp(nΦ(x))μ0(dx)\mu_n(dx) \propto \exp(-n\Phi(x))\mu_0(dx), the Laplace approximation N(xn,n1Hn1)\mathcal{N}(x_n, n^{-1}H_n^{-1}) converges in Hellinger distance at rate O(n1/2)O(n^{-1/2}), provided sufficient regularity (smoothness, strict convexity at the mode, nonvanishing curvature in the tails) (Schillings et al., 2019). Additionally, a deterministic computable upper bound on the KL\mathrm{KL} divergence between the true posterior and QLA is available for log-concave densities: KL(gf)CdEeg[Δ3(e)2],\mathrm{KL}(g\Vert f) \lessapprox C_d\,\mathbb{E}_{e\sim g}[\Delta_3(e)^2], where CdC_d is an explicit dimensional constant, gg is the Laplace surrogate, and Δ3(e)\Delta_3(e) is the third directional derivative of the negative log-target along direction ee. Empirical evaluation on logistic regression shows this bound is tight to within a factor of order unity in practical dimensions (Dehaene, 2017).

5. Applications: Deep Learning, Bayesian Inverse Problems, and Latent-Variable Models

QLA has been deployed in the following:

  • Deep Neural Networks: QLA improves uncertainty quantification over the Linearized Laplace Approximation (LLA) by introducing per-data rank-one curvature corrections without incurring the cubic computational cost of the full Hessian. Empirical results on UCI regression (e.g., Energy, Boston Housing) show consistent reductions in NLL and marginal improvements in CRPS over LLA (Jiménez et al., 3 Feb 2026).
  • Bayesian Inverse Problems: QLA, when used as a proposal or base measure in importance sampling and quasi-Monte Carlo, eliminates the explosive variance and deteriorating performance of prior-based schemes as the posterior concentrates, enabling robust Monte Carlo integration even as the data volume grows or the noise variance vanishes (Schillings et al., 2019).
  • Latent Gaussian State-Space Models: Sequential QLA (iterLap, SIG) supports recursive Bayesian filtering by local quadratic expansions at each time step, optionally combined with Monte Carlo or EM-like corrective updates, yielding computationally tractable and accurate estimates for both static parameters and latent states (Mai et al., 2015).

6. Methodological Variants: QLA Versus LLA, Rank-One and Mixture Extensions

A comparative summary:

Method Curvature Used Scalability Application Domain
Laplace (LA) Full Hessian Poor (large nn) General Bayesian inference
LLA GGN only (drops quadratic) High DNNs, large-scale models
QLA Dominant local curvature (rank-1 per datum or mixture) High (w/some cost) DNNs, mixture models, inverse problems
IterLap QLA Mixture of local quadratics Moderate Multimodal/skewed posteriors

QLA provides a controlled trade-off between fidelity and cost: it corrects the misspecification of LLA in the posterior precision matrix while remaining tractable in parameter-rich regimes.

7. Practical Considerations and Limitations

  • Computation: QLA’s additional cost is typically a small multiple (2×\sim 2\times) of LLA if K10K\approx 10 power-iterations are used. Rank-one surrogates per datum can be batched or subsampled in large datasets (Jiménez et al., 3 Feb 2026).
  • Storage: For DNNs, QLA adds a single parameter vector per data case; in practice, storage is manageable and can be aggregated over minibatches.
  • Assumptions: QLA requires twice (preferably thrice) differentiable log-posteriors; modes returned by EM or optimization must be maxima where the negative Hessian is positive definite (Brümmer, 2014).
  • Caveats: In the presence of pronounced non-Gaussianity (multi-modality, heavy tails), mixture or higher-order generalizations (iterLap) are preferable (Mai et al., 2015). Computation of high-order derivatives (e.g., third-order tensors for KL bounds) may be prohibitive beyond moderate dimension (Dehaene, 2017).
  • Uncertainty Quantification: For strongly concentrated or high-dimensional posteriors, QLA yields robust uncertainty estimates and credible sets that shrink at the parametric rate; credence in tail regions or for model selection should be verified against these theoretical caveats (Schillings et al., 2019).

References

  • (Brümmer, 2014) The EM algorithm and the Laplace Approximation
  • (Jiménez et al., 3 Feb 2026) Improving the Linearized Laplace Approximation via Quadratic Approximations
  • (Schillings et al., 2019) On the Convergence of the Laplace Approximation and Noise-Level-Robustness of Laplace-based Monte Carlo Methods for Bayesian Inverse Problems
  • (Dehaene, 2017) Computing the quality of the Laplace approximation
  • (Mai et al., 2015) Bayesian sequential parameter estimation with a Laplace type approximation

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quadratic Laplace Approximation (QLA).