Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sparse Variational GP-GAMs

Updated 16 March 2026
  • Sparse Variational GP-GAMs are a scalable Bayesian framework that combines Gaussian Processes with Generalized Additive Models to enable flexible, interpretable function decomposition.
  • They employ inducing variables and sparse variational inference to significantly reduce computational complexity while accurately quantifying uncertainty.
  • The framework uses stochastic optimization and structured posterior coupling to efficiently handle diverse likelihoods in regression, classification, and count data applications.

Sparse Variational Gaussian Process Generalized Additive Models (Sparse Variational GP-GAMs) provide a scalable Bayesian framework for learning flexible, interpretable additive function decompositions while rigorously quantifying uncertainty. These models merge the representational power of Gaussian Processes (GPs) with Generalized Additive Models (GAMs) and employ advanced variational inference to make computation tractable for large datasets through sparsity and structured posteriors (Adam, 2017, Adam et al., 2018).

1. Model Formulation and Additive Structure

A core attribute of Sparse Variational GP-GAMs is their additive structure. For inputs x=(x1,,xD)x=(x_1,\ldots,x_D) and outputs yy, the latent predictor is

η(x)=c=1Cfc(xc)\eta(x)=\sum_{c=1}^C f_c(x_c)

where each fcf_c is a real-valued function with independent GP priors, fcGP(0,kc(,))f_c\sim\mathcal{GP}(0, k_c(\cdot,\cdot)) (Adam et al., 2018).

Observed responses may be modeled via a (possibly non-Gaussian) factorizing likelihood:

p(y1:Nf1:C)=n=1Np(ynη(xn)).p(y_{1:N}\mid f_{1:C}) = \prod_{n=1}^N p(y_n\mid\eta(x_n)).

For generalized additive modeling, link functions gg (identity, logit, log, etc.) are incorporated: g(E[yn])=c=1Cfc(xn,c)g(\mathbb{E}[y_n]) = \sum_{c=1}^C f_c(x_{n,c}). This formulation accommodates regression and a broad family of exponential-family likelihoods (Adam et al., 2018).

2. Inducing Variable Framework and Sparse Approximation

To reduce the O(N3)O(N^3) complexity of GP inference, Sparse Variational GP-GAMs introduce MM inducing points per component:

Zc={zc(1),...,zc(M)},uc=[fc(zc(1)),...,fc(zc(M))]RM.Z_c = \{z_c^{(1)}, ..., z_c^{(M)}\},\quad u_c = [f_c(z_c^{(1)}), ..., f_c(z_c^{(M)})]^\top\in\mathbb{R}^M.

Stacking all components yields

U=[u1,...,uC]RMC.U = [u_1^\top,...,u_C^\top]^\top\in\mathbb{R}^{MC}.

The prior on UU is Gaussian with block-diagonal covariance, and the conditional process for all fcf_c given UU is analytic (Adam et al., 2018, Adam, 2017).

This sparse approach enables scalable computation, as MNM\ll N, providing near-linear scaling in NN and cubic scaling in MM per component (Adam, 2017).

3. Variational Posterior Parameterization and Posterior Coupling

Variational inference approximates the true posterior p(f,Uy)p(f,U\mid y) via:

q(f,U)=p(fU)q(U),q(U)=N(Um,S).q(f,U) = p(f\mid U)\,q(U),\quad q(U) = \mathcal{N}(U\mid m, S).

Structured covariance SS can capture posterior dependencies among components, exceeding the expressive power of mean-field approximations. Marginalization of q(f,U)q(f,U) and calculation of predictive means/free variational parameters μU,ΣU\mu_{U}, \Sigma_{U} respect both inter- and intra-component dependencies (Adam, 2017).

The use of a single multivariate Gaussian for q(U)q(U) enables the representation of cross-component posterior coupling—critical for calibrated posterior variances and uncertainty quantification, especially when the posterior is not well-approximated by independent marginals (Adam, 2017, Adam et al., 2018). Imposing block-low-rank or diagonal structure in SS can reduce computational costs.

4. Evidence Lower Bound (ELBO) and Computational Complexity

The central variational objective is the Evidence Lower Bound (ELBO):

L=Eq(f,U)[logp(yf)]KL[q(U)p(U)]\mathcal{L} = \mathbb{E}_{q(f,U)}[\log p(y\mid f)] - \mathrm{KL}[q(U)\,\|\,p(U)]

with terms that factorize efficiently owing to the additive model structure. The expectation is typically estimated by Monte Carlo sampling U=m+LϵU = m + L\epsilon, ϵN(0,I)\epsilon \sim \mathcal{N}(0,I).

The cost for an ELBO evaluation depends on the structure of SS:

  • Mean-field SS: O(CM3+NCM2)O(CM^3 + NC M^2) per iteration.
  • Full posterior coupling (full SS): O(CM3+NC2M2)O(CM^3 + NC^2M^2) per iteration (Adam et al., 2018).
  • KL calculations and conditional predictions further benefit from analytic tractability between Gaussians (Adam, 2017).

Complexity for prediction and sampling at batch size BB is O(BCM2)O(B\,CM^2), with further improvements possible by exploiting low-rank or block structures.

5. Inference Algorithms and Optimization Procedures

Stochastic variational inference is used to optimize variational parameters (m,S)(m,S), inducing inputs {Zc}\{Z_c\}, and kernel hyperparameters (θc,σ2)(\theta_c, \sigma^2). The canonical procedure consists of:

  1. Sampling mini-batches of data (xi,yi)(x_i, y_i).
  2. Sampling ϵN(0,I)\epsilon\sim\mathcal{N}(0,I), setting U=m+LϵU = m + L\epsilon.
  3. Computing predictive means/variances for all fc(xi,c)f_c(x_{i,c}) for the batch.
  4. Monte-Carlo or analytic evaluation of the expected log-likelihood.
  5. Closed-form evaluation of the KL divergence.
  6. Gradient estimation via automatic differentiation and parameter update (e.g., Adam, natural gradient) (Adam, 2017, Adam et al., 2018).

The following table summarizes major computational steps:

Step Complexity (mean-field) Complexity (full coupling)
Factorize/factor matrices O(CM3)O(CM^3) O((CM)3)O((CM)^3)
Predictive means/variances O(NCM2)O(NCM^2) O(NC2M2)O(NC^2M^2)
KL divergence O(CM3)O(CM^3) O((CM)3)O((CM)^3)

6. Practical Implementation and Calibration

Efficient implementation recommendations include using automatic-differentiation frameworks (e.g., TensorFlow with GPflow, PyTorch with GPyTorch), placing inducing points ZcZ_c via domain-covering heuristics (grids or k-means), and monitoring the ELBO and held-out log-likelihood for convergence (Adam et al., 2018). The sum of marginal variances across additive components provides the variance of η(x)\eta(x), forming credible intervals for the predictor and facilitating Bayesian model calibration.

Posterior variance and credible intervals are derived from the GP-marginal perspective, yielding well-calibrated uncertainty quantification that accounts for both model and approximation error, as the posterior approximation q(f,U)q(f,U) remains a valid Gaussian process (Adam et al., 2018).

7. Applications and Extensions

The sparse variational GP-GAM framework generalizes across regression (Gaussian noise) and non-Gaussian likelihoods (Binomial, Poisson, etc.), with appropriate link function integration:

  • For regression: identity link.
  • For count/Bernoulli data: logit/log link.

Each fcf_c operates on the subspace xcx_c, so the kernel kck_c and inducing inputs ZcZ_c are defined over Rdim(xc)\mathbb{R}^{\text{dim}(x_c)}. The same inference procedure extends directly to these tasks by changing the pointwise log-likelihood in the ELBO (Adam, 2017, Adam et al., 2018).

This framework is applicable in any scenario requiring interpretable, uncertainty-aware additive modeling for large datasets, including biostatistics, spatio-temporal modeling, and automatic feature-effect discovery.


References:

  • (Adam, 2017) "Structured Variational Inference for Coupled Gaussian Processes"
  • (Adam et al., 2018) "Scalable GAM using sparse variational Gaussian processes"
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Variational Gaussian Process GAMs.