Local Gaussian Regression (LGR)

Updated 22 April 2026

Local Gaussian Regression (LGR) is a set of methods that localize standard GP inference to reduce the cubic computational cost and handle nonstationarity.
LGR approaches—including neighborhood subsampling, partition-based ensembles, and localized kernel modifications—balance scalability with prediction accuracy.
LGR is applied in fields like computational mechanics, online control, and system identification to offer efficient uncertainty quantification and enhanced interpretability.

Local Gaussian Regression (LGR) encompasses a family of methodologies designed to exploit the nonparametric power and uncertainty quantification of Gaussian process regression (GPR) while overcoming its prohibitive computational costs for large-scale, high-dimensional, or nonstationary data. LGR achieves this by restricting GPR inference, learning, and prediction to locally-selected subsets of data or model components, or by introducing localized kernel modifications. The field has developed a variety of algorithmic realizations, including subsampled local GP fitting, partition-based ensemble frameworks, localized basis function approaches, and methods based on spatially-varying hyperparameters or kernel localization. LGR has found application in computational mechanics, online regression, control, system identification, and high-throughput emulation, offering a combination of statistical fidelity, computational scalability, and, in some variants, enhanced interpretability.

1. Core Principles and Model Families

The defining principle of Local Gaussian Regression is the restriction of standard GPR inference—either prediction, training, or both—to a local region of input space, with the goal of reducing the $O(N^3)$ cost of GP matrix inversion and enabling adaptation to local nonstationarity or heteroscedasticity. Distinct architectural choices lead to several LGR model families:

Neighborhood Subsampling GPR: For each prediction point $x_*$ , a local subset of $n\ll N$ training points near $x_*$ is used to fit a GP model. Standard GP predictive mean and variance are computed using the local kernel matrix and its cross-covariance with $x_*$ (Fuhg et al., 2021, Sung et al., 2016). The local training set can be selected by nearest-neighbor search or more sophisticated variance-reduction criteria.
Partition-Based Ensemble Models: The input space is hierarchically or evenly partitioned, with each subregion assigned an independent local GP expert. Examples include binary-tree structures with soft partitions (Dividing Local Gaussian Processes, DLGP (Lederer et al., 2020, Braun et al., 2024)) and data covers with overlapping clusters (Overlapping Domain Cover, ODC (Elhoseiny et al., 2017)). At prediction, one or more experts are combined, often with overlap-weighted blending to reduce discontinuities.
Localizing Basis Functions and Localized Inference: LGR can use localized (e.g., RBF-gated) basis functions, resulting in spatially-adaptive lengthscales and variationally decoupled local submodels (Meier et al., 2014). This interpolates between locally weighted regression and GP regression, but retains a global generative framework for uncertainty and hyperparameters.
Localized Kernel Modifications: LGR can be realized via kernel modifications that apply compact-support or rapidly decaying localization functions to the base kernel, enforcing sparsity in the Gram matrix and limiting the influence of distant training points (Gogolashvili et al., 2022).
Sliding-Window and Local-Kernel Approaches in Frequency Domains: In domains such as system identification, LGR is used with locally tailored kernels (e.g., dot-product plus resonance kernels) within sliding frequency windows, automatically adapting local model complexity (Fang et al., 2024).

A plausible implication is that these families allow for broad customization of locality depending on computational, statistical, or physical constraints.

2. Mathematical Formulation and Algorithms

Most LGR variants are built on the canonical GP regression posterior:

$\mu(x_*) = k(x_*, X_n)[K_n + \sigma_n^2 I]^{-1}y_n,\quad \sigma^2(x_*) = k(x_*, x_*) - k(x_*, X_n)[K_n + \sigma_n^2 I]^{-1}k(X_n, x_*),$

where $X_n$ is a local training set. Key algorithmic features and their context include:

Local Training Set Construction: Points can be chosen via nearest-neighbor search (Fuhg et al., 2021, Jackson et al., 2021), greedy variance-reduction (Sung et al., 2016), or by partition assignment (Lederer et al., 2020, Braun et al., 2024, Elhoseiny et al., 2017). Efficient search can exploit k-d trees or locality-sensitive hashing (Sung et al., 2016).
Hyperparameter Tuning: Hyperparameters of kernels (lengthscales, noise) are typically optimized for each local model, either independently or inherited from global estimates. In some settings they are statically fixed offline (Jackson et al., 2021), while in others they are optimized per local batch or region (Fuhg et al., 2021, Braun et al., 2024).
Prediction Aggregation: For partition-based methods, predictions from multiple local models may be combined via density-based weights or mixture-of-experts rules to ensure smooth output and uncertainty blending (Elhoseiny et al., 2017, Lederer et al., 2020, Gogolashvili et al., 2022).
Incrementality and Online Updating: Online or continual learning is supported in frameworks like DLGP or GPTreeO by incremental updates of leaf-local GPs on data streams, with periodic hyperparameter re-optimization and uncertainty calibration (Lederer et al., 2020, Braun et al., 2024).
Local Kernel Formulations: In localization-kernel approaches, each prediction forms a weight- and locality-induced submatrix, leveraging the sparsity for $O(n^3)$ per-prediction scaling (Gogolashvili et al., 2022).

LGR can also incorporate special kernels tailored to local analytic structure or physical priors, as in frequency response estimation (Fang et al., 2024), and supports locally linear or interpretable explanations under GP priors (Yoshikawa et al., 2020).

3. Computational Complexity and Scalability

A central motivation for LGR is the reduction of computational burdens inherent in global GPR:

Method/Class	Training Complexity	Prediction Complexity
Global GP	$O(N^3)$	$O(N^2)$ per query
Neighborhood LGR	$x_$ 0 (preproc) + $x_$ 1 per query	$x_*$ 2 per query
Partition/Ensemble	$x_$ 3 ( $x_$ 4, $x_*$ 5 size/number of local models)	$x_$ 6– $x_$ 7
DLGP (tree, overlap)	$x_$ 8 per split, $x_$ 9 update	$n\ll N$ 0– $n\ll N$ 1
Kernel-localized LGR	$n\ll N$ 2 per test point	$n\ll N$ 3 per test

Empirical studies show that LGR enables practical regression with $n\ll N$ 4 using moderate computational resources, with negligible degradation in predictive mean and minor widening of predictive intervals versus global GPR (Sung et al., 2016, Gogolashvili et al., 2022). Online variants (DLGP, GPTreeO) enable real-time regression with millisecond update and prediction rates on streaming high-dimensional data (Lederer et al., 2020, Braun et al., 2024).

4. Applications and Empirical Performance

LGR has enabled substantive advances across a variety of domains:

Data-driven Constitutive Laws and Multiscale Mechanics: laGPR provides accurate, uncertainty-aware surrogates for stress predictions in nonlinear constitutive modeling. It outperforms neural networks and $n\ll N$ 5-NN in convergence, accuracy ( $n\ll N$ 6– $n\ll N$ 7 error on hyperelastic benchmarks), and provides robust, mesh-independent convergence in finite-strain FE analysis (Fuhg et al., 2021).
Real-Time Control and Reinforcement Learning: LGR allows fast, adaptive closed-loop policy refinement by constructing local GPs around the current system state, significantly shrinking the computational burden for uncertainty-bound value iteration and control synthesis (Jackson et al., 2021).
Streaming and Continual Learning: DLGP and GPTreeO architectures deliver sublinear (practically logarithmic) cost per update and per prediction, achieving nMSE near that of exact GPs while accommodating abrupt covariate or distributional shifts (Lederer et al., 2020, Braun et al., 2024).
Nonlinear System Identification: Localized GPR with specialized kernels yields state-of-the-art accuracy for frequency response estimation and robust model selection, outperforming local polynomial and rational methods in both accuracy and noise robustness (Fang et al., 2024).
Interpretability: Local GPR models enable post-hoc feature attribution via locally linear GP priors, delivering sample-specific explanations with closed-form uncertainty for both predictions and coefficients (Yoshikawa et al., 2020).
Nonstationary Regression and Function Emulation: Local kernel GPR and partition-based LGR excel where process characteristics vary spatially or temporally, consistently achieving lower mean-squared error than global GPs or traditional local regressors, with runtime savings of $n\ll N$ 8– $n\ll N$ 9 in empirical studies (Gogolashvili et al., 2022).

5. Model Selection, Hyperparameterization, and Limitations

The effective use of LGR requires consideration of several tuning and methodological choices:

Locality Scale Selection: The neighborhood size, kernel bandwidth, or partition granularity controls the bias–variance and computational trade-offs. For neighborhood methods, exhaustive or adaptive search can be computationally intensive, motivating distance/pruning techniques and feature-approximation (Sung et al., 2016, Gogolashvili et al., 2022).
Overlap and Coverage: In partition-based frameworks, the degree of overlap between subdomains reduces boundary artifacts and ensures continuity but increases per-query computational load and memory (Elhoseiny et al., 2017). Optimal trade-offs are dataset- and dimension-dependent.
Hyperparameter Coupling: Sharing hyperparameters globally enhances stability and prevents overfitting, at the expense of local adaptivity. Per-local fit optimization is more accurate but computationally heavier (Fuhg et al., 2021, Braun et al., 2024).
Limitations:
- For very sparse data, local models may be ill-conditioned.
- Batch addition and hyperparameter cross-validation require further study (Sung et al., 2016).
- Variational approaches introduce bias via submodel decoupling (Meier et al., 2014).
- Localization may struggle in high-dimensional settings if data sparsity impedes local model statistical power.

6. Extensions, Theory, and Relations to Other Methods

LGR connects to and extends several related frameworks:

Sparse/Inducing Point GPs: Unlike inducing point GPs, which build global low-rank kernel approximations, LGR performs local, adaptive sparsification aligned with prediction or control demands (Gogolashvili et al., 2022).
Mixture of Experts, Product of Experts: Partition-based LGR can be seen as a continuous mixture-of-local-GP experts model, with overlap controlling mixture smoothness and predictive variance reduction (Elhoseiny et al., 2017).
Locally Weighted Regression/Kernel Ridge Regression: Classical local models are special cases; LGR generalizes these with full probabilistic generative treatment, principled marginal-likelihood hyperparameter estimation, and predictive uncertainty (Meier et al., 2014, Gogolashvili et al., 2022).
Nonparametric Experiment Design and Active Learning: Information-driven data acquisition, as in nonlinear continuation tracking, demonstrates the utility of coupling LGR with active experimental design for efficient uncertainty reduction (Renson et al., 2019).
Bayesian Interpretation and Theoretical Guarantees: High-probability deviation bounds for local GPR models hold under RKHS assumptions, enabling the derivation of risk bounds for policy and control (Jackson et al., 2021).

A plausible implication is that the comprehensive flexibility of LGR makes it an attractive tool for full uncertainty quantification, interpretable regression, and scalable online learning in scientific and engineering contexts.

7. Summary Table of Representative LGR Approaches

Approach	Locality Mechanism	Complexity (training/prediction)	Distinctive Features	Key References
Neighborhood Subsampling	$x_*$ 0-nearest neighbors per query	$x_*$ 1 per query	Standard GP formulae, flexible choice	(Fuhg et al., 2021, Sung et al., 2016)
Tree/Partition Ensemble	Dynamic binary tree, overlapping clusters	$x_$ 2 local, $x_$ 3 routing	Scalable, real-time, online adaptation	(Lederer et al., 2020, Braun et al., 2024, Elhoseiny et al., 2017)
Basis Function LGR	Gated parametric local models	$x_$ 4 per iter. ( $x_$ 5 models)	Variational, spatially-varying lengthscales	(Meier et al., 2014)
Localized Kernel GPR	Localizing kernel per test point	$x_*$ 6 per test	Sparse Gram, adapts to nonstationarity	(Gogolashvili et al., 2022)
Physics-informed Local GPR	Windowed kernels (e.g., resonant systems)	$x_$ 7 per window ( $x_$ 8)	Custom local structure, FRF estimation	(Fang et al., 2024)

These paradigms collectively define the modern landscape of Local Gaussian Regression.

References:

(Fuhg et al., 2021) Fuhg et al., "Local approximate Gaussian process regression for data-driven constitutive laws: Development and comparison with neural networks"
(Sung et al., 2016) Gramacy et al., "Potentially Predictive Variance Reducing Subsample Locations in Local Gaussian Process Regression"
(Meier et al., 2014) Urtasun et al., "Local Gaussian Regression"
(Lederer et al., 2020) Lederer et al., "Real-Time Regression with Dividing Local Gaussian Processes"
(Braun et al., 2024) Lederer et al., "GPTreeO: An R package for continual regression with dividing local Gaussian processes"
(Elhoseiny et al., 2017) Li et al., "Overlapping Cover Local Regression Machines"
(Gogolashvili et al., 2022) Niu et al., "Locally Smoothed Gaussian Process Regression"
(Jackson et al., 2021) Ivanov et al., "Synergistic Offline-Online Control Synthesis via Local Gaussian Process Regression"
(Renson et al., 2019) Barton et al., "Numerical Continuation in Nonlinear Experiments using Local Gaussian Process Regression"
(Fang et al., 2024) Qi et al., "A Local Gaussian Process Regression Approach to Frequency Response Function Estimation"
(Yoshikawa et al., 2020) Xie & Mori, "Gaussian Process Regression with Local Explanation"