Local Gaussian Process Approximation

Updated 17 April 2026

Local Gaussian Process Approximation is a scalable GP method that constructs local models using relevant data subsets to reduce computational complexity.
It utilizes techniques such as nearest neighbor selection, localized basis expansions, and covariance localization to effectively handle nonstationarity and detail.
Empirical evaluations show enhanced prediction accuracy and reliable uncertainty quantification in applications like terrain mapping, computer experiments, and dynamic modeling.

Local Gaussian Process Approximation refers to a broad class of scalable techniques that approximate Gaussian process (GP) regression by performing localized inference in subsets of the data, inducing locality either through basis function construction, data partitioning, covariance localization, or selective neighborhood design. The principal objective is to preserve the nonparametric flexibility and uncertainty quantification of GPs while reducing the computational cost from the cubic scaling of classical GPs to a tractable regime for large or high-dimensional datasets.

1. Theoretical Foundations of Local Gaussian Process Approximation

Classical GP regression models a function $y(x)$ as a realization from a Gaussian process prior with a prescribed covariance kernel $k(x, x')$ and provides closed-form predictive mean and variance for new inputs $x_*$ given all $N$ training observations, with $O(N^3)$ training and $O(N^2)$ prediction cost per test input. The local GP approximation replaces the global model with a collection of local models, each constructed using a small subset of the data relevant to the prediction location. The rationale draws from the rapid spatial decay of many kernels and the redundant information conveyed by distant data points in large, dense designs.

The prototypical local GP constructs, for each prediction $x_*$ , a sub-design $X_n(x_*) \subset X_N$ ( $n \ll N$ ), typically by selecting the $n$ nearest neighbors or via a greedy active learning Cohn (ALC) criterion to maximize variance reduction at $k(x, x')$ 0 (Gramacy et al., 2013). The local covariance and mean prediction are

$k(x, x')$ 1

where $k(x, x')$ 2 is the $k(x, x')$ 3 local kernel matrix and $k(x, x')$ 4 the vector of cross-covariances.

Local Gaussian Regression (LGR) (Meier et al., 2014) generalizes localizing approaches by introducing a global parametric basis

$k(x, x')$ 5

where $k(x, x')$ 6 is a localizer (typically Gaussian-shaped and centered at $k(x, x')$ 7 with local length-scale matrix $k(x, x')$ 8) and $k(x, x')$ 9 encodes low-order features, such as constants or linear trends in $x_*$ 0. The full model then expresses $x_*$ 1, with Gaussian prior $x_*$ 2. Inference leverages variational decoupling to achieve complexity $x_*$ 3, where $x_*$ 4 is the number of local models and $x_*$ 5 the local feature dimension.

2. Methodological Variants and Algorithmic Implementations

There are several principal families of local GP approximation methods:

Neighborhood-based Local Regression: Each prediction uses a locally chosen sub-design, built via $x_*$ 6-nearest neighbor search or greedy variance-reduction (ALC or MSPE-based) sequential design (Gramacy et al., 2013, Gramacy et al., 2013, Sung et al., 2016). Fast Woodbury-based rank-one updates and parallelization across targets enable scalable emulation. Search limiting methods, such as maximum distance or feature approximation, further reduce construction time (Sung et al., 2016).
Local Basis Function Expansions: Models such as LGR (Meier et al., 2014) and truncated local random feature maps (Wacker et al., 2022) expand the model in spatially local basis elements. In LGR, spatially varying length scales $x_*$ 7 allow adaptive modeling of local nonstationarity.
Covariance Localization: Methods such as Locally Smoothed GP Regression (Gogolashvili et al., 2022) directly localize the covariance matrix for each test point using localization kernels (e.g., compact-support or Gaussian), yielding sparse, adaptive Gram matrices and nonstationary inference.
Global-Local and Composite GP Models: Approaches such as TwinGP (Vakayil et al., 2023) and composite GPs (Ba et al., 2013) combine a global "trend" GP with overlapping or edge-corrected local kernels. The global subset covers broad-scale variation, while local neighborhoods capture fine-scale phenomena. The overall kernel is a sum or convex combination.
Block and Clustered Local GPs: Partition-based approaches (GPRF (Moore et al., 2015), distributed GPs (Jalali et al., 2020)) couple independent local GPs via shared boundaries or pairwise potentials to restore smoothness and marginal likelihood coherence.
Local GP for Functional Outputs: For dynamic computer models, SVD-based local GPs select neighborhoods in input space and reduce time-series output dimension via SVD; each mode is modeled by its own GP and sequential local design targets mean integrated squared prediction error (Zhang et al., 2016).
Recursive and Information-Filter Updates: In scalable mapping, e.g., robotic SLAM, finite-support local basis functions (e.g., truncated kernels on spatial grids) and recursive sparse information-filter updates enable tractable GP mapping over very large spatial domains (Viset et al., 2022).

3. Statistical Properties and Adaptivity

Local GP approximations inherit many desirable properties from the global GP when neighborhoods are judiciously selected and local hyperparameters are re-estimated per region:

Uncertainty Quantification: Each local GP yields predictive mean and variance, quantifying epistemic uncertainty locally. Approximations with basis expansions (e.g., LGR, SVD-GP) provide explicit or empirical Bayes variance estimates for all outputs, and composite/integrated variants propagate both global and local uncertainty.
Nonstationarity: By allowing either spatially-varying kernel length scales (e.g., LGR's $x_*$ 8, CGP's local kernels (Ba et al., 2013)) or by re-estimating hyperparameters per local model, local GPs adapt to variable smoothness, anisotropy, or heteroskedasticity across the input domain.
Interpolation vs. Regularization: Many local GP methods interpolate training data when noise is negligible and the local neighborhood is appropriate. In composite or global-local schemes, the global component ensures smooth extrapolation; the local component restores fine-grained detail and exact interpolation (Ba et al., 2013).

4. Computational Complexity and Scalability

A major motivation for local approximation is the cubic scaling $x_*$ 9 of global GPs. The complexity of local GP methods depends on neighborhood size $N$ 0, data dimension $N$ 1, and modeling specifics:

Method class	Training Complexity	Prediction Complexity (per $N$ 2)	Scalability Regime
Standard GP	$N$ 3	$N$ 4	Small to modest $N$ 5
Local GP ( $N$ 6-neighborhood)	$N$ 7 (search)	$N$ 8	Large $N$ 9, moderate $O(N^3)$ 0
LGR (variational)	$O(N^3)$ 1	$O(N^3)$ 2	Large $O(N^3)$ 3, modest $O(N^3)$ 4
Localized random features	$O(N^3)$ 5	$O(N^3)$ 6	Large $O(N^3)$ 7, small $O(N^3)$ 8
Partitioned block/cluster	$O(N^3)$ 9	$O(N^2)$ 0 per partition	Very large $O(N^2)$ 1

Selecting small $O(N^2)$ 2 (or $O(N^2)$ 3, $O(N^2)$ 4, $O(N^2)$ 5) and leveraging parallel computation are crucial for tractability (Gramacy et al., 2013, Zhang et al., 2016, Viset et al., 2022). For example, neighborhood search is $O(N^2)$ 6 (brute-force) or $O(N^2)$ 7 (k-d tree), and Cholesky on $O(N^2)$ 8 is $O(N^2)$ 9. LGR's variational inference enjoys per-model cubics $x_*$ 0, similar to locally weighted regression, and is thus suitable for problems with multiple regions of interest and small basis size (Meier et al., 2014).

5. Empirical Performance and Applications

Extensive empirical evaluation has been performed across benchmark functions, real-world spatial modeling, and computer experiment settings:

Emulation of Computer Experiments: In large-scale emulation, local GP with variance-reduction subset selection outperforms $x_*$ 1-NN and compactly supported covariance approaches in root mean squared prediction error and computational speed (Gramacy et al., 2013).
Adaptive Terrain Mapping: Recursive local-basis GP mapping with truncated kernels achieves real-time performance and accuracy comparable to global GPs, with update times independent of total area and robust interpolation in robotics and SLAM tasks (Viset et al., 2022).
Dynamics and Functional Outputs: In dynamic computer models, local SVD-GP methods yield 30–50% improvements in log-NMSPE and proper scoring over naive $x_*$ 2-NN local selection, matching global GP with 2–3 orders lower cost (Zhang et al., 2016).
Spatial Nonhomogeneity: LGR demonstrates the ability to learn spatially varying length scales per region, enabling accurate modeling of sharp transitions and anisotropy (Meier et al., 2014).
Composite GP and Global-Local Fusion: Models combining global and local kernels (e.g., TwinGP) achieve state-of-the-art RMSE and order-of-magnitude runtime improvements over partitioned kriging or classical GP (Vakayil et al., 2023). CGP models capture both global smoothness and local volatility, yielding robust interval coverage and improved stability on sparse designs (Ba et al., 2013).

6. Practical Considerations and Extensions

Several practical insights emerge from the literature:

Neighborhood Selection: The accuracy and computational cost of local GP methods depend sensitively on the selection and size of the local subset. Variance-reduction and information-theoretic criteria generally outperform simple $x_*$ 3-NN. Feature approximation and distance-based screening accelerate neighborhood search with negligible increase in predictive variance (Sung et al., 2016).
Hyperparameter Tuning: Local estimation of kernel parameters is essential in nonstationary or spatially inhomogeneous problems. Many procedures use empirical Bayes or profile likelihood estimation per local model (Gramacy et al., 2013, Ba et al., 2013).
Combination with Inducing/Basis Approaches: Local random feature approximations circumvent pathologies of global Maclaurin expansions in high-frequency regimes and allow for scalable, variance-controlled linearization of the GP kernel (Wacker et al., 2022). Related approaches extend to hybrid block and cluster fusion with coupling potentials or robust aggregations (Moore et al., 2015, Jalali et al., 2020).
Uncertainty Calibration and Consistency: Methods that couple or aggregate local predictions (e.g., GPRF, GRBCM-aggregated clusters) restore predictive consistency and uncertainty quantification, especially near partition boundaries (Moore et al., 2015, Jalali et al., 2020).

7. Limitations and Ongoing Research

Although highly effective in scaling GP inference, local approximations face certain limitations:

Discontinuities at Boundaries: Independent local models can lead to discontinuities and poor uncertainty estimates at subregion boundaries. Coupling schemes or overlap/correction strategies partially alleviate this (Moore et al., 2015).
Complexity for High Dimensional Inputs: As ambient dimension increases, the size of local neighborhoods necessary to capture relevant variation often increases, impacting both accuracy and computational feasibility.
Parameter Tuning: Choice of local model sizes, clustering bandwidths (e.g., in loc-smooth GPR), and kernel scaling parameters require careful cross-validation or pilot optimization.
Global Nonstationarity: In absence of explicit global models, local hyperparameter estimation alone does not provide strong extrapolation; hybrid or composite models address this by fusing short- and long-range information (Ba et al., 2013, Vakayil et al., 2023).

Active research seeks to improve local GP aggregation, automate partitioning and coupling strategies, develop more expressive basis/localization schemes (e.g., higher-order surrogates (Jeong et al., 16 Dec 2025)), and extend these frameworks to deep or hierarchical Gaussian process models.

References:

Local Gaussian Regression (Meier et al., 2014)
Local Random Feature Approximations of the Gaussian Kernel (Wacker et al., 2022)
A Global-Local Approximation Framework for Large-Scale Gaussian Process Modeling (Vakayil et al., 2023)
Massively parallel approximate Gaussian process regression (Gramacy et al., 2013)
Local Gaussian process approximation for large computer experiments (Gramacy et al., 2013)
Composite Gaussian process models for emulating expensive functions (Ba et al., 2013)
Locally Smoothed Gaussian Process Regression (Gogolashvili et al., 2022)
Local approximate Gaussian process regression for data-driven constitutive laws (Fuhg et al., 2021)
Gaussian Process Random Fields (Moore et al., 2015)
Local Gaussian Process Model for Large-scale Dynamic Computer Experiments (Zhang et al., 2016)
Spatially scalable recursive estimation of Gaussian process terrain maps using local basis functions (Viset et al., 2022)
On the Local Structure and Approximation Stability of Block Isotropic Gaussian Fields (Jeong et al., 16 Dec 2025)
Potentially Predictive Variance Reducing Subsample Locations in Local Gaussian Process Regression (Sung et al., 2016)
Aggregating Dependent Gaussian Experts in Local Approximation (Jalali et al., 2020)