Locally Gaussian Approximation

Updated 10 August 2025

Locally Gaussian approximation is a method that replaces a complex model with local Gaussian patches to capture nonstationary behaviors.
It uses localized basis functions and variational inference techniques to convert global problems into efficient, tractable subproblems.
This approach offers strong theoretical guarantees and is applied in nonparametric regression, federated learning, SDEs, and spatial statistics.

A locally Gaussian approximation is a theoretical and computational strategy that replaces a complex, often high-dimensional or non-Gaussian object—such as a function, random field, stochastic process, or algorithmic trajectory—with a patchwork of local Gaussian models or linearized structures that are valid in restricted regions of the domain. This paradigm permeates a wide spectrum of applied mathematics, statistics, and machine learning, including nonparametric regression, random fields on manifolds, approximations for SDEs, federated learning, and high-dimensional function approximation. The following sections provide a detailed exposition of the foundational theory, methodological constructions, computational advantages, prominent applications, and conceptual contrasts with global Gaussian approximations, as substantiated in the cited literature, especially "Local Gaussian Regression" (Meier et al., 2014), "Excursion Probabilities of Isotropic and Locally Isotropic Gaussian Random Fields on Manifolds" (Cheng, 2015), and several others.

1. Localized Function Basis and Model Construction

Locally Gaussian approximation in regression sets out from the observation that global Gaussian process (GP) models, while theoretically appealing, face severe computational and expressivity challenges. Instead, the approach localizes the GP by constructing a large collection of "local models" $f_m(x)$ , each supported on a different region, and combines their predictions. Mathematically, one introduces a set of localizing basis functions:

$\phi_m(x) = n_m(x) E_m(x),$

where $n_m(x)$ is a nonnegative, peaked ("radial basis") function centered at $c_m$ —commonly a Gaussian $n_m(x) = \exp(-\|x-c_m\|^2/2\lambda^2)$ —and $E_m(x)$ encodes the local model structure (e.g., constant, linear, or low-order polynomial). The overall predicted function is then

$f(x) = \sum_m \phi_m(x)^\top w_m.$

This modularization enables the representation of functions exhibiting local stationarity, discontinuities, or rapidly-varying behavior otherwise inaccessible to global GPs or kernel methods with fixed-length scales. In effect, the target function is approximated by a superposition of locally valid Gaussian models.

In the context of random fields or spatial processes, "locally isotropic" or "locally self-similar" Gaussian fields (see (Cheng, 2015, Novikov, 5 Feb 2024)) are characterized by covariance kernels that are, in a small neighborhood of each point, approximately isotropic or self-similar, but which can vary globally. This enables the application of classical Euclidean machinery—e.g., Pickands' approximations or Berman-type constants—local to chart neighborhoods, then patched together via geometric partition-of-unity arguments.

2. Variational and Statistical Inference Techniques

Inference in the local Gaussian paradigm generally eschews full coupling of all weights/parameters. In "Local Gaussian Regression" (Meier et al., 2014), a variational approximation is constructed where the latent targets $f_{nm}$ at $x_n$ for local models $m$ enable nearly independent updates for each $w_m$ , leading to the posterior approximation:

$q(w,f) = q(w) q(f)$

with tractable E-step updates such as

$\mu_{w_m} = \Sigma_m^{-1} \left( \sum_n \phi_m(x_n) \mathbb{E}[f_{nm}] \right), \quad \Sigma_m = (A_m + \beta \sum_n \phi_m(x_n)\phi_m(x_n)^T)^{-1},$

where $A_m$ is an ARD precision matrix. Localization ensures that the variational updates are "almost local," as the localization functions $n_m(x)$ decay rapidly, making contributions from distant data negligible.

In small-noise SDE settings (Sanz-Alonso et al., 2016), local Gaussian approximation involves matching the mean and covariance dynamics through ODEs derived from the drift and the local linearization of the system:

$m'(t) = f(m(t)), \quad \frac{dC}{dt} = Df(m(t))C + C Df(m(t))^\top + \Sigma,$

yielding a time-evolving Gaussian approximation whose Kullback-Leibler divergence from the true solution is controlled as $O(\epsilon)$ in the noise amplitude.

For federated learning algorithms (Bonnerjee et al., 12 May 2025), local Gaussian approximations describe the scaled (and possibly time-uniform) law of iterates of decentralized SGD by a Gaussian distribution whose accuracy is quantified explicitly via Berry–Esseen bounds and coupling arguments.

3. Computational Complexity and Scalability

The most salient computational benefit is the decoupling of formerly global inference tasks into a collection of small, localized problems:

Avoiding Global Matrix Inversion: Global GPR scales as $O(N^3)$ ; local GP/regression costs reduce to $O(MK^3)$ (with $M$ the number of local models and $K$ the number of parameters per model) due to the sparsity of the effective Gram matrix imposed by localization.
Incremental and Online Capability: Because local models are updated only by data themselves localized in space or time, the method supports efficient online and real-time learning—vital in control (Meier et al., 2014) and simulation (Cole et al., 2020), where new data may arrive continuously and only influence a small region.
Sparsification Mechanisms: Localizing kernels or windowing functions (rectangular, Epanechnikov, or even spatially bounded radial kernels) in Gaussian process regression (see (Gogolashvili et al., 2022, Vakayil et al., 2023)) transform the typically dense covariance into a sparse Gram matrix for each prediction location, drastically reducing the cost of matrix inversion and enabling near-linear scalability in $N$ .

Additional accelerations employ local inducing point methods (Cole et al., 2020), template-based schemes, or hybrid global-local strategies. For instance, TwinGP (Vakayil et al., 2023) combines a small set of global points (for global trend capture) with local nearest neighbors, using block-matrix inversion to further reduce cost.

4. Applications Across Domains

Locally Gaussian or locally linear approximation schemes are central in diverse domains:

Nonparametric and Spatial Regression: High-dimensional or spatially-varying regression, especially when global smoothness is absent or global GPs are too expensive, e.g., in spatial statistics or sensor networks (Meier et al., 2014).
Manifold and Geometry-Driven Problems: Excursion probability computation for Gaussian fields on manifolds using locally isotropic approximations (Cheng, 2015), enabling accurate estimation of rare-event probabilities in geosciences and cosmology.
Machine Learning Algorithms: Federated learning (Bonnerjee et al., 12 May 2025) where each client's local SGD path can be approximated as Gaussian with explicit, finite-sample error control, enabling robust adversarial detection via multiplier bootstrapping.
Simulation and Surrogate Modeling: Surrogate models for high-fidelity computer simulations using local inducing point GPs (Cole et al., 2020) or global-local partition-of-data approaches (Vakayil et al., 2023) for scalable uncertainty quantification.
Nonlinear State Space Models and Filtering: Natural-gradient-based iterative local Gaussian approximations in Bayesian filtering (Cao et al., 21 Oct 2024), providing provable convergence for nonlinear dynamical systems beyond linearization approaches (EKF/UKF).

5. Theoretical Guarantees and Limiting Behavior

Locally Gaussian approximations exhibit strong theoretical justifications:

Statistical Accuracy: For small-noise diffusions, the KL divergence between the locally Gaussian approximation and the true law is $O(\epsilon)$ (Sanz-Alonso et al., 2016).
Berry–Esseen-Type Results: In decentralized federated optimization, the accuracy of local Gaussian approximation of the Polyak–Ruppert iterate is truly non-asymptotic, featuring explicit finite-sample error rates (Bonnerjee et al., 12 May 2025).
Functional Limit Theorems: For random fields or stochastic processes, local Gaussian approximations yield precise asymptotic formulas for excursion probabilities, sojourns, or ruin events, generalizing Pickands and Berman constants to locally isotropic or locally self-similar cases (Cheng, 2015, Novikov, 5 Feb 2024).
Bias and Complexity Bounds: In simulation of path functionals (e.g., drawdown in Lévy models), the bias induced by a locally Gaussian (small jumps) approximation is explicitly controlled in Wasserstein and Kolmogorov metrics, with nearly optimal Monte Carlo and MLMC complexity for broad classes of functionals (Cázares et al., 2020).

6. Comparison with Global Gaussian Methods

The locally Gaussian approach must be carefully distinguished from global methods:

Property	Local Gaussian Approximation	Global Gaussian Process/Kernel Regression
Model expressivity	Adapts to local nonstationarity and behavior	Enforced by global kernel, often assumes stationarity
Scalability	Linear or near-linear in data size; localized updates	Cubic (or quadratic) in $N$ ; costly for large $N$
Uncertainty quantification	Bayesian, with local valid posterior variances	Global posterior; may be inaccurate in nonstationary regions
Adaptivity	Allows spatially varying lengthscales, local patch selection	One or few lengthscales for kernel; adaptation challenging
Computational tractability	Matrix inversions restricted to small subproblems; enables online learning	Full $N\times N$ matrix inversion; batch only
Limitations	Requires tuning of local region/order/overlap; may lose global consistency	May underfit local structure, inflexible for spatial heterogeneity

7. Conceptual Extensions and Open Directions

The local Gaussian paradigm is not confined to regression and spatial processes. Notable extensions include:

Locally linearized Bayesian neural network predictions (Immer et al., 2020) employing the generalized Gauss–Newton method.
Local random feature kernel approximations with explicit test-point centering to cure vanishing pathologies in high-frequency domains (Wacker et al., 2022).
Patchwork models using local Gaussian mixtures for sparse adaptive representation of curves or images—e.g., sparse Gaussian approximations matching the $N$ -term rate-optimality of curvelets for anisotropic signals in $L_2(\mathbb{R}^2)$ (Erb et al., 2019).
Non-Gaussian quasi-likelihoods for SDEs with locally stable noise, where the "locally Gaussian" approximation is explicitly abandoned in favor of stable laws (Masuda, 2016) when the process increments are heavy-tailed.

A plausible implication is that future research will further hybridize local and global approaches, refine localization heuristics, and establish tighter statistical and computational guarantees for adaptive, scalable, and uncertainty-quantified modeling in large-scale, heterogeneous, and high-dimensional environments.