Papers
Topics
Authors
Recent
Search
2000 character limit reached

Inducing Variables in Gaussian Processes

Updated 27 May 2026
  • Inducing variables are latent constructs in Gaussian processes that summarize full GP posteriors using a lower-dimensional set of pseudo-points.
  • They enable scalable inference by reducing computational complexity from O(N³) to O(NM²) while maintaining predictive accuracy.
  • Their selection, parametrization, and design—ranging from point evaluations to inter-domain projections—directly impact performance in both standard and deep GP models.

Inducing variables are latent constructs in Gaussian process (GP) modeling introduced to enable scalable inference by summarizing the information present in the full GP with a lower-dimensional set of “pseudo-points.” In standard and deep Gaussian processes, inducing variables can take the form of point-evaluations, inter-domain projections, or more general functionals, and are central to state-of-the-art scalable variational and fully Bayesian inference frameworks. Their selection, parametrization, and inference directly impact both predictive performance and computational complexity.

1. Formulation and Role of Inducing Variables in Sparse GPs

In the standard sparse variational GP (SVGP) approach, a collection of MM inducing inputs Z={zm}m=1MZ = \{ z_m\}_{m=1}^M is chosen in the input space, and their corresponding function values u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M define the inducing variables. The joint prior over training function values ff and uu in a GP with kernel kk is given by

p(f,u)=p(fu)p(u),p(f, u) = p(f \mid u) \, p(u),

where

p(u)=N(u0,KZZ), p(fu)=N(fKXZKZZ1u,KXXKXZKZZ1KZX),p(u) = \mathcal{N}(u \mid 0, K_{ZZ}), \ p(f \mid u) = \mathcal{N}(f \mid K_{XZ} K_{ZZ}^{-1} u, K_{XX} - K_{XZ} K_{ZZ}^{-1} K_{ZX}),

with KABK_{AB} denoting the matrix [k(a,b)][k(a, b)] over sets Z={zm}m=1MZ = \{ z_m\}_{m=1}^M0 (Xu et al., 2024, Uhrenholt et al., 2020, Tiao et al., 2023, Tsitsvero et al., 2022, Panos et al., 2018, Rossi et al., 2020).

The inducing variables act as a low-rank summary of the GP posterior, reducing the required computational effort from Z={zm}m=1MZ = \{ z_m\}_{m=1}^M1 to Z={zm}m=1MZ = \{ z_m\}_{m=1}^M2, with Z={zm}m=1MZ = \{ z_m\}_{m=1}^M3 the data size and Z={zm}m=1MZ = \{ z_m\}_{m=1}^M4.

2. Variational and Bayesian Inference with Inducing Variables

2.1 Stochastic Variational Inference

The variational framework introduces an approximate posterior, typically Gaussian,

Z={zm}m=1MZ = \{ z_m\}_{m=1}^M5

and seeks to maximize the evidence lower bound (ELBO):

Z={zm}m=1MZ = \{ z_m\}_{m=1}^M6

Both the inducing locations Z={zm}m=1MZ = \{ z_m\}_{m=1}^M7 and variational parameters Z={zm}m=1MZ = \{ z_m\}_{m=1}^M8 are optimized jointly (Tsitsvero et al., 2022, Tiao et al., 2023, Panos et al., 2018).

2.2 Bayesian Treatments

A fully Bayesian approach places priors on the inducing inputs Z={zm}m=1MZ = \{ z_m\}_{m=1}^M9 and kernel hyperparameters, treating them as random variables. Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is used to jointly sample from the posterior over u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M0, improving uncertainty quantification and predictive accuracy over point-estimate or purely variational approaches (Rossi et al., 2020). This Bayesian treatment also addresses the sensitivity to inducing point placement (Uhrenholt et al., 2020).

2.3 Extensions: Point-Process Priors and Variable-Size Sets

To make the number and selection of inducing points part of the model, point-process priors (e.g., u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M1) are used, with the inclusion of each candidate location parameterized by u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M2. The variational posterior u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M3 is then fit under the ELBO, allowing the model to determine both number and placement of inducing points (Uhrenholt et al., 2020).

3. Generalizations: Inter-Domain and Orthogonal Inducing Variables

Inducing variables are not restricted to point evaluations. Inter-domain inducing variables are defined as linear functionals,

u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M4

where u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M5 are chosen basis functions in the RKHS of u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M6. These generalize classical inducing points and can be constructed from, for example, spherical harmonics (zonal functions) or neural-style features (Tiao et al., 2023).

Orthogonally-decoupled GP frameworks decompose the kernel as u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M7, with u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M8 spanned by the inducing feature map and u={f(zm)}m=1Mu = \{ f(z_m)\}_{m=1}^M9 covering the orthogonal complement. Two families of inducing variables ff0 (principal) and ff1 (orthogonal) are introduced, enabling the simultaneous, independent enrichment of the mean and variance approximations at reduced computational cost (Tiao et al., 2023).

4. Scalable Approximations and Subspace Inducing Inputs

For high-dimensional data, the cost of kernel computations involving full-dimensional ff2 becomes prohibitive. A scalable alternative is to constrain inducing inputs to a low-rank subspace, ff3, with ff4 and ff5 a data-driven orthonormal basis (e.g., top right singular vectors). This reduces the computational cost of kernel evaluations from ff6 to ff7 per iteration while retaining flexibility (Panos et al., 2018). Numerically stable “kernel-preconditioned” parametrizations of ff8, such as ff9 and uu0 with uu1 diagonal, avoid instabilities and reduce the requirement for explicit regularization.

5. Inducing Variables in Deep Gaussian Processes

Deep Gaussian processes (DGPs) stack multiple GPs, with each hidden layer uu2 having its own set of uu3 inducing inputs uu4 and latent outputs uu5. The full joint prior becomes

uu6

with uu7 Gaussian (Xu et al., 2024). Inducing variables at each layer render otherwise intractable integrals and marginalizations computationally feasible at uu8 cost.

Recent advances address the challenge of accurate posterior inference over these multi-layer inducing variables, including

6. Algorithmic Details and Empirical Comparisons

6.1 Classical and Advanced Inference Schemes

  • DSVI (Doubly Stochastic VI): Takes kk2, efficient but potentially biased for complex posteriors.
  • IPVI (Implicit VI): Uses neural-network samplers, adversarial losses, does not yield explicit ELBO.
  • DDVI: Employs reverse-time diffusion processes, optimizing an explicit path-space KL and corresponding ELBO, judged to outperform DSVI and IPVI in both expressivity and stability (Xu et al., 2024).

6.2 Empirical Findings

  • DDVI-based DGPs achieve superior test RMSE, NLL, and calibration on UCI regression benchmarks (datasets up to kk3) and achieve state-of-the-art accuracy on MNIST, Fashion-MNIST, CIFAR-10, SUSY, and HIGGS datasets (Xu et al., 2024).
  • Probabilistic selection methods empirically show that as inducing points become less informative, the model prunes unnecessary points, trading off sparsity and predictive fit (Uhrenholt et al., 2020).
  • Subspace inducing inputs offer computational speed-ups of kk4–kk5 without significant accuracy loss in extreme multi-label and high-dimensional settings (Panos et al., 2018).
  • Mean-field variational GP approaches, when contrasted with Bayesian and DDVI methods, can exhibit underestimation of uncertainty and higher test errors, especially when inducing variable posteriors are complex (Tsitsvero et al., 2022, Rossi et al., 2020, Xu et al., 2024).

7. Selection, Initialization, and Design of Inducing Variables

Proper initialization and adaptive learning of kk6 are critical for model performance. Random selection from training points or guided methods (e.g., k-means, farthest-point) can be employed; variationally optimized kk7 tend to outperform fixed sets, especially in representing out-of-distribution or unseen test points (Tsitsvero et al., 2022). In inter-domain and orthogonally-decoupled setups, expressive basis design (e.g., spherical harmonics, neural network features) is important to cover both mean and covariance structures efficiently (Tiao et al., 2023).

Point-process-based variational inference provides an additional layer of adaptivity, allowing the model to jointly infer both which locations to include and the number of inducing variables, automatically matching model complexity to data structure (Uhrenholt et al., 2020).


References:

  • (Xu et al., 2024): "Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference"
  • (Uhrenholt et al., 2020): "Probabilistic selection of inducing points in sparse Gaussian processes"
  • (Tiao et al., 2023): "Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes"
  • (Tsitsvero et al., 2022): "Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes"
  • (Panos et al., 2018): "Fully Scalable Gaussian Processes using Subspace Inducing Inputs"
  • (Rossi et al., 2020): "Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations"

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Inducing Variables.