Laplace-approximated Gaussian Process
- Laplace-approximated Gaussian Process is an inference technique that uses a second-order Taylor expansion around the mode to approximate non-Gaussian likelihoods with a Gaussian distribution.
- It enables fast, scalable Bayesian inference for generalized GP models by providing analytic approximations of moments, predictive distributions, and marginal likelihood.
- Extensions like Vecchia-Laplace and function-space Laplace facilitate large-scale applications and neural network implementations, offering robust uncertainty quantification and computational savings.
A Laplace-approximated Gaussian Process (GP) refers to a broad class of inference techniques for non-Gaussian likelihoods in GP models, where the analytically intractable posterior over latent functions is approximated as a multivariate normal distribution by a second-order Taylor expansion at the posterior mode. This framework enables fast, general, and scalable Bayesian inference for a range of generalized GP models, including settings with correlated multivariate outputs, exponential-family likelihoods, and large data regimes where scalable variants such as Vecchia-Laplace become crucial. It underpins both classical statistical GP modeling and recent Bayesian deep learning and meta-learning algorithms.
1. Model Structure and Motivation
Let be a -dimensional latent function, , endowed with a zero-mean GP prior:
$f(\mathbf x) = (\eta^{(1)}(\mathbf x),\dots,\eta^{(D)}(\mathbf x)) \sim \mathcal{GP}(\mathbf 0, K(\mathbf x, \mathbf x'}}$
where, for a priori independent outputs, is block-diagonal. At each observe . The likelihood is assumed to be from the multivariate exponential family,
where is the natural parameter, typically related to through a (possibly non-canonical) link such that . The exact posterior
is non-Gaussian and lacks a closed form except for Gaussian likelihoods (Chan, 2013).
2. The Laplace Approximation: Derivation and Properties
The Laplace approximation constructs a local Gaussian surrogate for by Taylor-expanding the log-posterior at its mode:
Let denote the maximizer (MAP). A second-order expansion gives
where the posterior precision is
with , at . The resulting Laplace-approximated posterior is
This yields analytic approximations of moments, predictive distributions, and enables direct marginal likelihood (“evidence”) approximations by integrating out :
The posterior mode is found with Newton–Raphson iterations (Chan, 2013).
3. Algorithmic Realizations and Computational Aspects
The general Laplace-approximated GP workflow is as follows:
- Compute gradient and Hessian of the log-likelihood for each datum.
- Iterate Newton updates:
where , .
- Form , set .
- Compute approximate evidence and employ it for hyperparameter learning.
For multivariate () and correlated outputs, the matrices become block-diagonal, and is sparse if and are so. For low-rank structure, Woodbury and block determinant identities reduce the asymptotic cost from naïvely to (Chan, 2013).
4. Extensions, Scalability, and Specialized Frameworks
Vecchia-Laplace Approximation for Large-Scale Data
The standard Laplace method is due to dense kernel inverses. The Vecchia-Laplace method replaces with a sparse precision derived from a -neighbor conditional independence structure:
This enables Newton and linear algebra updates on complexity and memory, targeting datasets with . Predictive means, variances, and Laplace evidence are available in similar computational regimes (Zilber et al., 2019, Kündig et al., 2023).
Function-Space Laplace in Neural Networks
The FSP-Laplace approach constructs the Laplace approximation in GP function space, identifying the “weak mode” of the posterior measure under a GP prior restricted to the neural network function class. After linearization at the weak mode, the Hessian over network parameters is formed and used for uncertainty estimation and prediction; scalable matrix-free routines handle high-dimensional parameter spaces (Cinquin et al., 2024).
Laplace Matching and Closed-Form Gaussianization
Laplace Matching “Gaussianizes” exponential-family likelihoods via a closed-form, basis-transformed Laplace expansion, producing Gaussian pseudo-likelihoods for each datum. This transforms generic non-Gaussian GP models into a form with analytic posterior and predictive formulas, sidestepping iterative optimization (Hobbhahn et al., 2021).
Special Cases: State-Space GPs and Custom Likelihoods
In time series or 1D–structured problems, state-space models enable Laplace-approximate inference, exploiting tridiagonal precision via Kalman filtering and smoothing (Nickisch et al., 2018). For Student-, Dirichlet, Von Mises, and other custom likelihoods, the approach is extended by re-deriving gradients/Hessians and possibly using Fisher information approximations for stability (Chan, 2013, Hartmann et al., 2017).
5. Applications and Empirical Performance
Laplace-approximated GPs have been applied to:
- Multivariate regression/classification with correlated outputs, including von Mises and Dirichlet likelihoods (Chan, 2013)
- Flexible density estimation within the logistic GP model, exceeding MCMC methods in efficiency while maintaining accuracy (Riihimäki et al., 2012)
- Scalable spatial modeling (e.g., MODIS water vapor data, ), nonparametric count and binary regression (Zilber et al., 2019, Kündig et al., 2023)
- Bayesian meta-learning and few-shot classification with LDA plugin surrogates for the MAP, resulting in significant computational savings and accuracy retention (Kim et al., 2021)
- Model-based latent class discrete choice, leveraging Laplace-approximated posteriors in EM (Sfeir et al., 2021)
- Robust regression with non-log-concave Student- likelihoods (standard and Fisher Laplace) (Hartmann et al., 2017)
- Bayesian deep learning function-space uncertainty calibration (Cinquin et al., 2024)
Empirical studies consistently find that the Laplace approximation achieves accuracy and calibration comparable to Markov Chain Monte Carlo or specialized variational algorithms, typically with one to two orders of magnitude computational advantage for moderate to large datasets (Zilber et al., 2019, Riihimäki et al., 2012).
6. Practical Considerations, Limitations, and Recommendations
Advantages:
- Model-agnostic: Applies to any (multivariate) exponential-family likelihood and general link.
- Scalable: Sparse/Vecchia variants and specializations (state-space, low-rank, matrix-free) allow extension to .
- Fast parameter learning: Marginal likelihood approximations enable efficient type-II MAP estimation.
- Robust: For classification and many regression tasks, Laplace uncertainty quantification is reliable.
Limitations:
- The approximation underestimates posterior variance, especially in extreme few-shot or highly non-Gaussian regimes (Kim et al., 2021, Riihimäki et al., 2012).
- No closed-form for highly nonlinear kernels in certain LDA-plugin or weight-space variants (Kim et al., 2021).
- Block-diagonal Hessian approximations ignore posterior correlations in some fast surrogates.
- Error bounds depend on the concavity and structure of the likelihood; theoretical control for some plugin surrogates is lacking.
Recommendations:
- For large non-Gaussian spatial data or structured outputs, combine Laplace inference with Vecchia sparsity and use iterative solvers with the VADU preconditioner (Kündig et al., 2023).
- For medium-scale classification or density estimation, employ standard Laplace with Newton updates and low-rank spectrum acceleration.
- In meta-learning or differentiable pipelines, plugin surrogates (e.g., LDA) with prior-norm adjustments yield competitive results (Kim et al., 2021).
- Use simulation-based unbiased variance estimates for accuracy-critical applications, especially for predictive uncertainty (Kündig et al., 2023).
7. References and Related Literature
- Multivariate generalized GP/Laplace: (Chan, 2013)
- Logistic GP density/regression: (Riihimäki et al., 2012)
- Vecchia-Laplace, spatial scaling: (Zilber et al., 2019, Kündig et al., 2023)
- State-space GPs with Laplace: (Nickisch et al., 2018)
- Meta few-shot with Laplace-LDA: (Kim et al., 2021)
- Heteroscedastic Student- Laplace: (Hartmann et al., 2017)
- Laplace Matching and closed-form Gaussianization: (Hobbhahn et al., 2021)
- Neural networks/function-space Laplace: (Cinquin et al., 2024)
- GP latent class choice models: (Sfeir et al., 2021)
The Laplace-approximated GP framework thus unifies a wide range of approximate Bayesian inference methodologies for generalized nonlinear GP models, offering both generality and practical computational tractability across modern probabilistic machine learning.