Laplace-approximated Gaussian Process

Updated 7 March 2026

Laplace-approximated Gaussian Process is an inference technique that uses a second-order Taylor expansion around the mode to approximate non-Gaussian likelihoods with a Gaussian distribution.
It enables fast, scalable Bayesian inference for generalized GP models by providing analytic approximations of moments, predictive distributions, and marginal likelihood.
Extensions like Vecchia-Laplace and function-space Laplace facilitate large-scale applications and neural network implementations, offering robust uncertainty quantification and computational savings.

A Laplace-approximated Gaussian Process (GP) refers to a broad class of inference techniques for non-Gaussian likelihoods in GP models, where the analytically intractable posterior over latent functions is approximated as a multivariate normal distribution by a second-order Taylor expansion at the posterior mode. This framework enables fast, general, and scalable Bayesian inference for a range of generalized GP models, including settings with correlated multivariate outputs, exponential-family likelihoods, and large data regimes where scalable variants such as Vecchia-Laplace become crucial. It underpins both classical statistical GP modeling and recent Bayesian deep learning and meta-learning algorithms.

1. Model Structure and Motivation

Let $f(\mathbf x)$ be a $D$ -dimensional latent function, $\mathbf x\in\mathbb R^p$ , endowed with a zero-mean GP prior:

$f(\mathbf x) = (\eta^{(1)}(\mathbf x),\dots,\eta^{(D)}(\mathbf x)) \sim \mathcal{GP}(\mathbf 0, K(\mathbf x, \mathbf x'}}$

where, for a priori independent outputs, $K(\mathbf x,\mathbf x')$ is block-diagonal. At each $\mathbf x_i$ observe $y_i\in\mathcal Y\subseteq\mathbb R^d$ . The likelihood $p(y_i \mid f_i)$ is assumed to be from the multivariate exponential family,

$p(y_i\mid\theta_i,\phi) = h(y_i,\phi)\exp\left\{\frac{1}{a(\phi)}[T(y_i)^\top\theta_i-b(\theta_i)]\right\}$

where $\theta_i$ is the natural parameter, typically related to $f_i$ through a (possibly non-canonical) link $g$ such that $\theta_i = g^{-1}(f_i)$ . The exact posterior

$p(f\mid y) \propto \prod_{i=1}^n p(y_i\mid \theta_i=g^{-1}(f_i))\,\mathcal N(f\mid 0,K)$

is non-Gaussian and lacks a closed form except for Gaussian likelihoods (Chan, 2013).

2. The Laplace Approximation: Derivation and Properties

The Laplace approximation constructs a local Gaussian surrogate for $p(f \mid y)$ by Taylor-expanding the log-posterior at its mode:

$L(f) = \log p(y\mid f) + \log p(f) + \mathrm{const} = \sum_{i=1}^n\log p(y_i\mid\theta_i) - \tfrac12 f^\top K^{-1}f$

Let $f^*$ denote the maximizer (MAP). A second-order expansion gives

$L(f) \approx L(f^*) - \tfrac12 (f-f^*)^\top A (f-f^*)$

where the posterior precision is

$A = -\nabla^2 L(f^*) = K^{-1} + W^*$

with $W^* = \mathrm{blockdiag}(U_1^*,\dots,U_n^*)$ , $U_i^* = -\nabla^2_{f_i}\log p(y_i\mid f_i)$ at $f_i^*$ . The resulting Laplace-approximated posterior is

$q(f\mid y) = \mathcal N(f\mid f^*,\,A^{-1})$

This yields analytic approximations of moments, predictive distributions, and enables direct marginal likelihood (“evidence”) approximations by integrating out $f$ :

$\log p(y) \approx \sum_{i=1}^n \log p(y_i\mid f^*_i) - \tfrac12 {f^*}^\top K^{-1} f^* - \tfrac12\log |I + K W^*| + \mathrm{const}$

The posterior mode $f^*$ is found with Newton–Raphson iterations (Chan, 2013).

3. Algorithmic Realizations and Computational Aspects

The general Laplace-approximated GP workflow is as follows:

Compute gradient and Hessian of the log-likelihood for each datum.
Iterate Newton updates:

$f^{\text{new}} = [K^{-1} + W(f^{\text{old}})]^{-1} u(f^{\text{old}})$

where $u_i(f_i) = \nabla_{f_i} \log p(y_i\mid f_i)$ , $W_i(f_i) = -\nabla^2_{f_i} \log p(y_i\mid f_i)$ .

Form $A = K^{-1} + W^*$ , set $q(f\mid y) = \mathcal N(f^*, A^{-1})$ .
Compute approximate evidence and employ it for hyperparameter learning.

For multivariate ( $D>1$ ) and correlated outputs, the matrices become block-diagonal, and $A$ is sparse if $K$ and $W$ are so. For low-rank $U_i$ structure, Woodbury and block determinant identities reduce the asymptotic cost from $O((nD)^3)$ naïvely to $O(D n^3)$ (Chan, 2013).

4. Extensions, Scalability, and Specialized Frameworks

Vecchia-Laplace Approximation for Large-Scale Data

The standard Laplace method is $O(n^3)$ due to dense kernel inverses. The Vecchia-Laplace method replaces $K^{-1}$ with a sparse precision $Q$ derived from a $m$ -neighbor conditional independence structure:

$K^{-1} \approx Q = (I-A)^\top D^{-1} (I-A)$

This enables Newton and linear algebra updates on $O(n\,m^3)$ complexity and $O(n\,m)$ memory, targeting datasets with $n\gtrsim 10^4$ . Predictive means, variances, and Laplace evidence are available in similar computational regimes (Zilber et al., 2019, Kündig et al., 2023).

Function-Space Laplace in Neural Networks

The FSP-Laplace approach constructs the Laplace approximation in GP function space, identifying the “weak mode” of the posterior measure under a GP prior restricted to the neural network function class. After linearization at the weak mode, the Hessian over network parameters is formed and used for uncertainty estimation and prediction; scalable matrix-free routines handle high-dimensional parameter spaces (Cinquin et al., 2024).

Laplace Matching and Closed-Form Gaussianization

Laplace Matching “Gaussianizes” exponential-family likelihoods via a closed-form, basis-transformed Laplace expansion, producing Gaussian pseudo-likelihoods for each datum. This transforms generic non-Gaussian GP models into a form with analytic posterior and predictive formulas, sidestepping iterative optimization (Hobbhahn et al., 2021).

Special Cases: State-Space GPs and Custom Likelihoods

In time series or 1D–structured problems, state-space models enable $O(n)$ Laplace-approximate inference, exploiting tridiagonal precision via Kalman filtering and smoothing (Nickisch et al., 2018). For Student- $t$ , Dirichlet, Von Mises, and other custom likelihoods, the approach is extended by re-deriving gradients/Hessians and possibly using Fisher information approximations for stability (Chan, 2013, Hartmann et al., 2017).

5. Applications and Empirical Performance

Laplace-approximated GPs have been applied to:

Multivariate regression/classification with correlated outputs, including von Mises and Dirichlet likelihoods (Chan, 2013)
Flexible density estimation within the logistic GP model, exceeding MCMC methods in efficiency while maintaining accuracy (Riihimäki et al., 2012)
Scalable spatial modeling (e.g., MODIS water vapor data, $n\sim 250{,}000$ ), nonparametric count and binary regression (Zilber et al., 2019, Kündig et al., 2023)
Bayesian meta-learning and few-shot classification with LDA plugin surrogates for the MAP, resulting in significant computational savings and accuracy retention (Kim et al., 2021)
Model-based latent class discrete choice, leveraging Laplace-approximated posteriors in EM (Sfeir et al., 2021)
Robust regression with non-log-concave Student- $t$ likelihoods (standard and Fisher Laplace) (Hartmann et al., 2017)
Bayesian deep learning function-space uncertainty calibration (Cinquin et al., 2024)

Empirical studies consistently find that the Laplace approximation achieves accuracy and calibration comparable to Markov Chain Monte Carlo or specialized variational algorithms, typically with one to two orders of magnitude computational advantage for moderate to large datasets (Zilber et al., 2019, Riihimäki et al., 2012).

6. Practical Considerations, Limitations, and Recommendations

Advantages:

Model-agnostic: Applies to any (multivariate) exponential-family likelihood and general link.
Scalable: Sparse/Vecchia variants and specializations (state-space, low-rank, matrix-free) allow extension to $n\gg 10^4$ .
Fast parameter learning: Marginal likelihood approximations enable efficient type-II MAP estimation.
Robust: For classification and many regression tasks, Laplace uncertainty quantification is reliable.

Limitations:

The approximation underestimates posterior variance, especially in extreme few-shot or highly non-Gaussian regimes (Kim et al., 2021, Riihimäki et al., 2012).
No closed-form for highly nonlinear kernels in certain LDA-plugin or weight-space variants (Kim et al., 2021).
Block-diagonal Hessian approximations ignore posterior correlations in some fast surrogates.
Error bounds depend on the concavity and structure of the likelihood; theoretical control for some plugin surrogates is lacking.

Recommendations:

For large non-Gaussian spatial data or structured outputs, combine Laplace inference with Vecchia sparsity and use iterative solvers with the VADU preconditioner (Kündig et al., 2023).
For medium-scale classification or density estimation, employ standard Laplace with Newton updates and low-rank spectrum acceleration.
In meta-learning or differentiable pipelines, plugin surrogates (e.g., LDA) with prior-norm adjustments yield competitive results (Kim et al., 2021).
Use simulation-based unbiased variance estimates for accuracy-critical applications, especially for predictive uncertainty (Kündig et al., 2023).

Multivariate generalized GP/Laplace: (Chan, 2013)
Logistic GP density/regression: (Riihimäki et al., 2012)
Vecchia-Laplace, spatial scaling: (Zilber et al., 2019, Kündig et al., 2023)
State-space GPs with Laplace: (Nickisch et al., 2018)
Meta few-shot with Laplace-LDA: (Kim et al., 2021)
Heteroscedastic Student- $t$ Laplace: (Hartmann et al., 2017)
Laplace Matching and closed-form Gaussianization: (Hobbhahn et al., 2021)
Neural networks/function-space Laplace: (Cinquin et al., 2024)
GP latent class choice models: (Sfeir et al., 2021)

The Laplace-approximated GP framework thus unifies a wide range of approximate Bayesian inference methodologies for generalized nonlinear GP models, offering both generality and practical computational tractability across modern probabilistic machine learning.