Papers
Topics
Authors
Recent
Search
2000 character limit reached

Locally Smoothed Gaussian Process Regression

Updated 22 April 2026
  • The paper introduces LSGPR, a method that restricts training point influence via localization kernels to enhance scalability and adaptivity.
  • It leverages techniques like adaptive neighbor selection and local hyperparameter tuning to mitigate issues of global GP regression.
  • Empirical evaluations reveal that LSGPR achieves superior prediction accuracy with significant computational speedups on benchmark datasets.

Locally Smoothed Gaussian Process Regression (LSGPR) refers to a class of Gaussian process regression methodologies that exploit localized structure in the input space to enhance computational tractability, statistical adaptivity, and prediction quality. By restricting or down-weighting the influence of distant training points on the prediction at any query location, LSGPR achieves a balance between the expressive power of GP models and the scalability or nonstationarity requirements encountered in many scientific and engineering applications. This paradigm encompasses a diversity of concrete frameworks, notably including weighting schemes via localization kernels, explicit partitioning, adaptive neighbor selection, and joint estimation of local hyperparameters or boundaries.

1. Theoretical Foundation: Localizing the Gaussian Process Prior

Standard Gaussian process regression for data D={(xi,yi)}i=1n\mathcal{D} = \{(x_i, y_i)\}_{i=1}^n places a Gaussian process prior f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot)) on the regression function, implying that all nn training points can exert global influence on the prediction at any test location x0x_0. Classically, the predictive mean and variance at x0x_0 are given by: m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0}, where KXXK_{XX} is the n×nn \times n kernel Gram matrix and σ2\sigma^2 is the noise variance.

LSGPR modifies this paradigm by introducing a localization kernel kh(x,x0)k_h(x, x_0) centered at f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))0 with localization scale f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))1, typically chosen such that f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))2 rapidly decays or is zero when f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))3. The prior and data are multiplied by f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))4, yielding a localized GP prior: f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))5 and localized observations f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))6. Only training points with f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))7 are included in the regression at f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))8, leading to sparsification of the Gram matrix and strictly local influences (Gogolashvili et al., 2022).

2. Localization Kernels and Neighborhood Construction

The choice of localization kernel f∼GP(0,K(⋅,⋅))f \sim \mathcal{GP}(0, K(\cdot, \cdot))9 is pivotal in LSGPR. Common examples include:

Kernel Type Formula Support
Rectangular nn0 Compact (nn1)
Epanechnikov nn2 Compact (nn3)
Gaussian nn4 All nn5
Hilbert (Sing.) nn6 Compact (nn7)

where nn8 is the volume of the unit nn9-ball. Compactly supported x0x_00 enforces strict localization; Gaussian weights allow soft decay.

Neighborhoods may also be defined adaptively, e.g., via x0x_01-nearest neighbors with lengthscale-weighted metrics or by identifying regions from data-driven partitions, as in splitting or partitioned GPs (Terry et al., 2020, Sung et al., 2016).

3. Inference and Prediction in the Localized GP Framework

Let x0x_02 denote the set of indices of training points sufficiently close to x0x_03. The locally modified Gram matrix is: x0x_04 where x0x_05.

Posterior mean and variance for x0x_06 are then: x0x_07 Hyperparameters (kernel, noise, localization width) are selected by maximizing the local marginal log-likelihood on each neighborhood, with x0x_08 typically chosen by grid search or by ensuring a minimum neighbor count (Gogolashvili et al., 2022).

The computational cost per test is x0x_09 for an x0x_00 system, with x0x_01. This is in sharp contrast with the x0x_02 scaling of global GPR.

4. Extensions: Adaptive Partitioning, Boundary Learning, and Local Hyperparameters

LSGPR gracefully accommodates a spectrum of extensions, each leveraging the local structure for improved adaptivity:

  • Joint Local Boundary Estimation: Estimating piecewise continuous functions with discontinuities, as in "Jump GP," involves constructing a local linear separator x0x_03, and keeping only those local neighbors on the same side as x0x_04, with x0x_05 learned jointly with kernel hyperparameters (Park, 2021).
  • Streaming and Partitioned GPs: Partitioning the input space adaptively (via principal direction divisive partitioning or similar) and fitting local GPs to each leaf enables bounded x0x_06 update time and linear memory scaling, suitable for streaming or massive datasets. Smoothing across boundaries is achieved by weighted aggregation of regionwise predictions (Terry et al., 2020).
  • Locally Adaptive Weights and Distances: Adaptive neighborhood selection can weight training points by lengthscale-informed Mahalanobis distances or insert explicit weights into the kernel matrix construction, as in variational sparse-spectrum and local feature GP regression (Tan et al., 2013, Meier et al., 2014).

These adaptations yield models that can locally select relevant features, tune smoothness to the local nonstationarity of the function of interest, and more accurately capture discontinuities or regime shifts.

5. Empirical Evaluation and Comparative Performance

A spectrum of benchmarks demonstrates that LSGPR models deliver competitive or superior predictive accuracy compared to both global GPR and alternative local/mixed models, typically at orders of magnitude lower computational cost per prediction (Gogolashvili et al., 2022).

For example, on UCI benchmarks (Yacht, Boston, Concrete, Kin8nm, Powerplant, Protein), LSGPR with Hilbert localization kernel achieved lower test MSE than both full GP and deep GP baselines in several cases, with local models operating on as few as x0x_07–x0x_08 neighbors per query and observed speedups on the order of x0x_09 for large m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},0 (Gogolashvili et al., 2022). "Jump GP" and partitioned LSGPR both report strong mitigation of boundary bias and improved performance in piecewise-continuous and partitioned-regime settings (Park, 2021, Terry et al., 2020).

6. Practical Guidelines and Trade-Offs

Selecting the localization scale m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},1 (or neighbor count m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},2) involves a bias-variance and speed-accuracy trade-off:

  • Small m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},3 (fewer neighbors): high adaptivity, lower computational burden, greater variance.
  • Large m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},4 (more neighbors): diminished locality, higher computation, predictions converge to global GPR.

Compactly supported localizers (rectangular, Epanechnikov) ensure sparse Gram matrices and computational parsimony. Soft/decaying kernels (Gaussian) may give smoother predictions but reduced sparsity.

Hyperparameters should be cross-validated locally. A kd-tree or similar spatial index is advised for efficient neighbor queries. For stability, inputs should be standardized.

Local smoothing inherently induces nonstationary behavior in the predictive covariance, even with stationary base kernels. This adaptivity is a primary strength, allowing LSGPR to accommodate heteroskedasticity, varying function smoothness, and regime shifts (Gogolashvili et al., 2022).

7. Limitations and Open Challenges

Limitations of LSGPR methods include:

  • Loss of some global structure in favor of local adaptation; very small m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},5 can lead to high-variance or unstable predictions.
  • For discontinuities, the quality of boundary estimation or partitioning is critical.
  • Selection of m(x0)=Kx0X(KXX+σ2I)−1y,v(x0)=K(x0,x0)−Kx0X(KXX+σ2I)−1KXx0,m(x_0) = K_{x_0X}(K_{XX} + \sigma^2 I)^{-1} y, \qquad v(x_0) = K(x_0, x_0) - K_{x_0X}(K_{XX} + \sigma^2 I)^{-1}K_{Xx_0},6, kernel type, and neighbor count remains problem-dependent and lacks universal automated prescription.
  • In very high-dimensional settings or for highly non-separable kernels, neighbor finding and kernel matrix assembly can still become costly.

Current research directions include extensions to more sophisticated adaptive neighborhood selection, combination with deep learning architectures for feature extraction, batch and online selection strategies, and theoretical guarantees of local model consistency under sparse sampling (Park, 2021, Terry et al., 2020).


For further mathematical and algorithmic details as well as implementation-focused recipes, see (Gogolashvili et al., 2022, Park, 2021, Terry et al., 2020), and (Sung et al., 2016).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Locally Smoothed Gaussian Process Regression.