Gaussian Process Quadrature (GPQ)

Updated 26 February 2026

Gaussian Process Quadrature (GPQ) is a probabilistic numerical integration method that models integrals via Gaussian processes, yielding both point estimates and calibrated uncertainty.
It employs various node selection strategies—random, deterministic, and active learning—to enhance accuracy and efficiency in estimating integrals.
GPQ generalizes classical quadrature rules, offering rigorous theoretical convergence rates and practical guidelines for kernel selection and scalable computational methods.

Gaussian Process Quadrature (GPQ) is a probabilistic framework for numerical integration that models the unknown integrand as a sample from a Gaussian process (GP). By leveraging the GP posterior conditional on observed function evaluations, GPQ delivers both point estimates and quantified uncertainty for integrals, which are crucial in Bayesian inference, filtering, uncertainty propagation, and other scientific computing domains. This approach subsumes classical quadrature rules under specific kernel and node choices and introduces a systematic methodology for model-based integration with rigorous theoretical and practical guarantees (Mahsereci et al., 18 Feb 2026).

1. Mathematical Model and Core Formulation

GPQ addresses the estimation of integrals of the form

$I = \int_D f(x) p(x) \, dx,$

where $D \subseteq \mathbb{R}^d$ , $p(x)$ is a known density, and $f$ is unknown but evaluated at a finite set of points. GPQ imposes a Gaussian process prior on $f$ ,

$f \sim \mathcal{GP}(m(\cdot), k(\cdot,\cdot)),$

with mean function $m: D \to \mathbb{R}$ and positive-definite covariance kernel $k: D \times D \to \mathbb{R}$ . Conditioned on $N$ noiseless evaluations $y_i = f(x_i)$ at nodes $X = \{x_1, \ldots, x_N\}$ , the posterior is again a GP: $\begin{aligned} \mu_N(x) &= m(x) + k_{xX} K^{-1}(y - m_X), \ \Sigma_N(x,x') &= k(x,x') - k_{xX} K^{-1} k_{Xx'}, \end{aligned}$ where $K$ is the kernel matrix on $X$ , $k_{xX} = (k(x,x_1),\ldots,k(x,x_N))$ , and $m_X$ is the vector of prior means at $X$ .

The posterior distribution for $I$ is Gaussian, with explicit expressions: $\begin{aligned} k_{P,X} &= \left(\int k(x_1,x) p(x) dx, \ldots, \int k(x_N,x) p(x) dx\right)^T, \ m_P &= \int m(x) p(x) dx, \ k_{PP} &= \iint k(x,x') p(x) p(x') dx dx', \end{aligned}$ so that

$\begin{aligned} \mathbb{E}[I \mid y] &= m_P + k_{P,X}^T K^{-1}(y - m_X), \ \mathbb{V}[I \mid y] &= k_{PP} - k_{P,X}^T K^{-1} k_{P,X}. \end{aligned}$

This provides both an estimator and a calibrated uncertainty measure for $I$ (Mahsereci et al., 18 Feb 2026, Särkkä et al., 2015, Prüher et al., 2017).

2. Node Selection Strategies and Sampling Paradigms

Node selection in GPQ directly controls the efficiency and accuracy of the estimator. Strategies include:

Random (Bayesian Monte Carlo): Nodes $x_i$ sampled i.i.d. from $p(x)$ . Variants employ importance sampling or determinantal point processes to improve coverage or diversity.
Space-filling deterministic designs: Regular grids and classical Gaussian quadrature rules (e.g., tensorized Gauss–Hermite), or quasi-Monte Carlo sequences (Sobol, lattice) are used for improved uniformity and optimal convergence for smooth integrands.
Active learning (Myopic or Sequential Design): Nodes are selected adaptively by maximizing an acquisition function, such as negative posterior variance, mutual information, or predictive uncertainty:

$x_{n+1} = \arg\max_x \mathcal{U}(x),$

where $\mathcal{U}(x)$ quantifies value according to the chosen utility criterion. This sequential approach often yields lower posterior variance for a fixed budget (Mahsereci et al., 18 Feb 2026).

3. Taxonomy and Extensions of GPQ

The GPQ methodology can be systematically classified along three axes (Mahsereci et al., 18 Feb 2026):

A. Modelling:

Conjugate GPQ: Direct GP prior on $f$ with Gaussian integral posterior (exact).
Warped or Non-conjugate GPQ: Transformations such as $f = \phi(g)$ , $g \sim \mathcal{GP}$ (e.g. exponentiation for nonnegative integrands), which necessitate approximate inference techniques.

B. Inference:

Exact/Analytic: Closed-form expressions for posterior mean and variance of the integral.
Approximate: Required when kernel means or double integrals are intractable; approaches include Laplace approximation, variational Bayes, or kernel quadrature of kernel means.
Scalable: Sparse-GP, random Fourier features, or kernel interpolation methods reduce cubic scaling to increase tractability for large $N$ .

C. Sampling:

Static/Model-independent: Node sets chosen without reference to current surrogate model, as in standard grids.
Sequential/Active: Nodes chosen adaptively by maximizing acquisition functions, targeting maximal reduction in integration uncertainty.

Non-conjugate strategies and scalable inference are especially relevant for high-dimensional or computationally expensive integrations as in modern Bayesian inference pipelines (Li et al., 2023, Hamid et al., 2021).

4. Theoretical Guarantees and Convergence Rates

Let $H(k)$ denote the reproducing kernel Hilbert space (RKHS) of the kernel $k$ . Define the worst-case error

$e_k(w;X) = \sup_{\|g\|_{\mathcal{H}} \leq 1} \left|\int_D g(x) p(x) dx - \sum_{i=1}^N w_i g(x_i)\right|.$

The GPQ weights $w = K^{-1} k_{P,X}$ minimize $e_k(w;X)$ , and the posterior variance $\sqrt{\Sigma}$ equals this minimal RKHS error (Mahsereci et al., 18 Feb 2026).

For $f \in H^\beta(D)$ with a Matérn kernel of smoothness $\nu$ (so that $H(k) \approx H^\alpha$ , $\alpha = \nu + d/2$ ), when nodes are quasi-uniform with fill distance $h_N \approx N^{-1/d}$ and $\beta \leq \alpha$ ,

$|I - \mu_N| \lesssim N^{-\beta/d}\|f\|_{H^\beta(D)},$

attaining the optimal Sobolev rate. For Bayesian Monte Carlo (random nodes),

$\mathbb{E}[|I-\mu_N|] \lesssim N^{-1/2}$

for all $f \in \mathcal{H}$ , matching classical Monte Carlo but often with smaller constants (Mahsereci et al., 18 Feb 2026).

5. Computational Complexity and Scalability

Classical GPQ has cubic computational complexity in the number of nodes: forming and inverting the kernel matrix $K$ costs $O(N^3)$ flops and $O(N^2)$ memory. Kernel means and double integrals ( $k_{P,X}$ , $k_{PP}$ ) require either closed form or secondary quadrature techniques. Numerical stability is enhanced by adding a small diagonal nugget $\lambda I$ to $K$ (Mahsereci et al., 18 Feb 2026).

Scalable variants involve:

Sparse-GP surrogates (inducing points) with cost $O(M^3)$ , $M \ll N$ (Li et al., 2023).
Random Fourier features and kernel interpolation for efficient kernel approximation (Mahsereci et al., 18 Feb 2026).
Quantum acceleration: Quantum Gaussian Process Quadrature combines Hilbert-space kernel approximation with quantum algorithms (phase estimation, qPCA, Hadamard/SWAP tests), yielding gate complexity polylogarithmic in problem size and a polynomial speedup over classical methods for large-scale settings (Galvis-Florez et al., 20 Feb 2025).

6. Practical Guidelines, Applications, and Limitations

Practical Guidance

Kernel selection: Choice of kernel (e.g., Matérn with $\nu \approx \beta-d/2$ ) should match prior smoothness assumptions on $f$ .
Prior mean: Zero mean is generally sufficient; trends can be handled via explicit basis functions.
Hyperparameters: Set by marginal likelihood or cross-validation; beware of overfitting, especially in adaptive node schemes. Use an initial space-filling design to mitigate this.
Numerical Stability: Add minimal necessary jitter/nugget to $K$ .
Software: Implementations include EmuKit and ProbNum (Python), GAIL (MATLAB), and standard GP libraries (GPy, GPflow, GPyTorch) for custom code.

Applications

GPQ is widely applied for:

Bayesian numerical integration (e.g., marginal likelihoods, model evidence) (Mahsereci et al., 18 Feb 2026, Hamid et al., 2021).
Bayesian inference with black-box likelihoods, particularly using post-process or variational sparse GPQ (Li et al., 2023).
Moment transforms in nonlinear filtering and smoothing, as a principled replacement for unscented or Gauss–Hermite transforms (Prüher et al., 2017, Särkkä et al., 2015).
Uncertainty quantification and computational statistics, such as kernel-marginalisation for hypermodel selection (Hamid et al., 2021).

Limitations

Kernel mean requirement: Closed-form kernel means are available only for particular kernels and measures (notably, Gaussian densities with RBF or polynomial kernels), restricting general applicability.
Cubic scaling: For large $N$ , classical GPQ becomes impractical unless approximate methods are used.
Active selection caveats: Overfitting can degrade uncertainty quantification, emphasizing the need for regularization and robust hyperparameter settings.

7. Connections to Classical Quadrature and Sigma-Point Methods

GPQ generalizes and subsumes all classical sigma-point quadrature methods:

Classical Rule	GP Kernel	Node Set	GPQ Specialization
Unscented Transform (UT)	Polynomial, deg. 3	$2n+1$ sym. sigmapoints	Exact for degree-3 polynomials
Gauss–Hermite (GH)	Tensor-poly, deg. $2P-1$	Product of 1D GH roots	Exact for degree $2P-1$ polynomials
Cubature (CKF etc.)	Poly, deg. 3	Axis-aligned sym. points	Relation to high-order, symmetric rules

For suitable kernels and nodes, GPQ recovers classical weights and achieves zero posterior variance when the integrand is in the span of the model. Employing smooth (e.g., squared-exponential) kernels and Bayes-optimal or quasi-random nodes, GPQ often attains lower RMSE and superior uncertainty quantification relative to traditional quadratures in moderate-dimensional, smooth-integration regimes (Särkkä et al., 2015, Prüher et al., 2017). GPQ enables principled construction of new, low-variance, and adaptive quadrature schemes for advanced estimation and filtering applications.