Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 172 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Non-Stationary Gaussian Processes

Updated 29 October 2025
  • Non-Stationary Gaussian Processes are stochastic models with covariance functions that vary over input space, capturing local changes in variance, correlation, and periodicity.
  • They encompass diverse methodologies such as parametric kernels, deep kernel learning, deep GPs, kernel warping, and partition models to balance flexibility with interpretability.
  • Advances in scalability, including sparse approximations and variational methods, enable efficient application of non-stationary GPs to large-scale scientific and engineering problems.

A non-stationary Gaussian process (GP) is a stochastic process whose covariance structure varies across the input space, thereby capturing phenomena whose statistical properties (e.g., variance, correlation length, or periodicity) shift spatially or temporally. This stands in contrast to stationary GPs, where the covariance depends only on the separation between inputs and is invariant to absolute position. Non-stationary GPs provide a principled, probabilistic modeling approach for functions and fields exhibiting heterogeneous, locally adaptive behavior, and underpin advanced surrogate modeling, forecasting, and uncertainty quantification across a diverse range of scientific and engineering domains.

1. Classes and Methodologies for Non-Stationary Gaussian Processes

Non-stationary GPs are realized through several principal mechanisms, each offering different trade-offs in flexibility, interpretability, and computational cost:

  • Parametric Non-Stationary Kernels: These models augment classical stationary kernels by allowing kernel parameters, such as signal variance g(x)g(\mathbf{x}) or lengthscale, to vary over input space through parametric functions—often basis function expansions (e.g., sums of RBFs). The general formulation is

k(xi,xj)=d=1Ngd(xi)gd(xj)kstat(xixj),k(\mathbf{x}_i, \mathbf{x}_j) = \sum_{d=1}^N g_d(\mathbf{x}_i) g_d(\mathbf{x}_j) k_\mathrm{stat}(|\mathbf{x}_i - \mathbf{x}_j|),

thereby inducing position-dependent amplitude and (optionally) smoothness. These are interpretable and moderately flexible but hyperparameter selection can become challenging as model complexity grows (Noack et al., 2023).

  • Deep Kernel Learning: Here, a stationary kernel is applied not on the original inputs but on their embedding ϕ(x)\boldsymbol{\phi}(\mathbf{x}) generated by a trainable neural network:

k(xi,xj)=kstat(ϕ(xi)ϕ(xj)).k(\mathbf{x}_i, \mathbf{x}_j) = k_\mathrm{stat}(\|\boldsymbol{\phi}(\mathbf{x}_i) - \boldsymbol{\phi}(\mathbf{x}_j)\|).

This approach achieves high flexibility—adapting, for instance, both variance and correlation structure—but at the expense of interpretability and a greater risk of overfitting or model misspecification, especially in moderate data regimes (Noack et al., 2023).

  • Deep Gaussian Processes (DGPs): By stacking GPs such that each layer's outputs serve as the inputs to the next, DGPs realize highly non-stationary and nonparametric representations:

f(l)(x)GP(m(l)(),k(l)(,)),for layers l=1,2,...f^{(l)}(\mathbf{x}) \sim GP(m^{(l)}(\cdot), k^{(l)}(\cdot, \cdot)), \quad \text{for layers } l=1,2,...

DGPs are universal function approximators but introduce significant inference complexity and loss of interpretability (Noack et al., 2023, Booth et al., 2023).

  • Kernel Warping: Non-stationarity is injected by transforming the inputs before applying a stationary kernel; for example, in warping-based models, a (potentially nonlinear) mapping xϕ(x)\mathbf{x} \mapsto \phi(\mathbf{x}) (specified parametrically or via another GP) dictates the local covariance geometry (Booth et al., 2023, Tolpin, 2019).
  • Mixture/Local GPs and Partition Models: The input space is divided (via trees, Voronoi tessellation, or Dirichlet processes), with each region assigned a local stationary GP whose parameters are adapted independently or coupled via hierarchical/Markov structures. Locally coupled GPs with HMM-regularized parameter trajectories (Ambrogioni et al., 2016) and Mixed-Stationary GPs (MSGPs) with Dirichlet process clustering (Duan et al., 2018) exemplify such constructions.
  • Compactly Supported and Sparse Nonstationary Kernels: Using kernels with explicit spatial support (e.g., products of Matérn or Wendland polynomials with spatially adaptive “bump” functions), one can induce local correlation and manageable sparsity for efficient inference in massive data regimes (Risser et al., 7 Nov 2024).

2. Mathematical Formulations and Examples of Non-Stationary Kernels

A representative selection of non-stationary kernel formulations includes:

  • Spatially-Varying Amplitude:

k(xi,xj)=g(xi)g(xj)kstat(xi,xj),k(\mathbf{x}_i, \mathbf{x}_j) = g(\mathbf{x}_i) g(\mathbf{x}_j) k_\mathrm{stat}(\mathbf{x}_i, \mathbf{x}_j),

where g()g(\cdot) modulates local variance (Noack et al., 2023).

  • Generalized Spectral Mixture (GSM) Kernel:

kGSM(xi,xj)=q=1Qwq(xi)wq(xj)kGibbs,q(xi,xj)cos[2π(μq(xi)xiμq(xj)xj)].k_\mathrm{GSM}(x_i, x_j) = \sum_{q=1}^Q w_q(x_i) w_q(x_j)\, k_{\text{Gibbs},q}(x_i, x_j)\, \cos[2\pi(\mu_q(x_i)x_i - \mu_q(x_j)x_j)].

Each mixture’s weights, means, and lengthscales may be input-dependent latent processes, granting the kernel rich nonstationarity (Ladopoulou et al., 13 May 2025).

  • Non-Stationary Matérn Kernel (Paciorek-Schervish):

k(x,x)=σ(x)σ(x)Σ(x)1/4Σ(x)1/4Σ(x)+Σ(x)21/2Mν(Q(x,x))k(\mathbf{x}, \mathbf{x}') = \sigma(\mathbf{x})\sigma(\mathbf{x}') \frac{|\Sigma(\mathbf{x})|^{1/4} |\Sigma(\mathbf{x}')|^{1/4}}{|\frac{\Sigma(\mathbf{x}) + \Sigma(\mathbf{x}')}{2}|^{1/2}} \mathcal{M}_{\nu}\left(\sqrt{Q(\mathbf{x}, \mathbf{x}')}\right)

where QQ is a local Mahalanobis distance interpolant, and Σ(x)\Sigma(\mathbf{x}) captures local anisotropy (Risser et al., 7 Nov 2024, Beckman et al., 2022).

  • Attentive Kernel (AK):

AK(x,x)=α(zˉzˉ)m=1Mwˉmwˉmkm(x,x),\mathrm{AK}(\mathbf{x}, \mathbf{x}') = \alpha\, (\bar{\mathbf{z}}^\top \bar{\mathbf{z}}') \sum_{m=1}^M \bar{w}_m \bar{w}_m' k_m(\mathbf{x}, \mathbf{x}'),

where w,z\mathbf{w}, \mathbf{z} are input-dependent, normalized attention weights; kmk_m are base kernels (Chen et al., 2023).

  • Locally Coupled Kernel (LC-GP):

kζ(t,t;{ϑi})=iw(t;ti)w(t;ti)ki(t,t;ϑi),k_\zeta(t, t'; \{\boldsymbol{\vartheta}_i\}) = \sum_{i} w(t; t_i) w(t'; t_i) k_i(t, t'; \boldsymbol{\vartheta}_i),

with ww localized basis functions and kik_i stationary or nonstationary kernels with parameters ϑi\boldsymbol{\vartheta}_i following a Markov process (Ambrogioni et al., 2016).

3. Empirical Performance and Use-Cases

Empirical studies demonstrate that non-stationary GPs substantially improve predictive accuracy and, crucially, uncertainty quantification—especially on data with locally-varying amplitude, smoothness, or dynamics:

  • Time Series with Regime Switches or Varying Frequency: Locally coupled GPs outperform stationary GPs in both state detection (e.g., phase transitions in brain oscillations) and denoising (Ambrogioni et al., 2016).
  • Environmental and Physical Simulation: Compactly supported nonstationary kernels permit exact GP inference for millions of geospatial measurements, yielding superior point and posterior predictive performance in climate data interpolation (Risser et al., 7 Nov 2024, Nychka et al., 2017).
  • Active Learning and Robotic Information Gathering: Non-stationary kernels (e.g., AK) provide well-calibrated, position-sensitive uncertainty, guiding data acquisition to informative or high-variation regions—improving map reconstruction and resource efficiency in autonomous systems (Chen et al., 2023).
  • Scientific Surrogates and Computer Experiments: Methods such as nonstationary latent-augmented GPs (Montagna et al., 2013) and deep Gaussian processes (Booth et al., 2023) have proven effective in resolving sharp local features and complex nonlinear dependencies in surrogate modeling for simulation codes.

The table below organizes core approaches, design principles, and computational profiles:

Class Nonstationarity Mechanism Interpretability Computational Cost
Parametric Input-dependent basis expansions High Moderate to high (grows with terms)
Deep Kernel / DGP Neural embedding / compositional warping Low High to very high
Compactly Supported Local bump functions, data-driven sparsity Medium Low (sparse algebra)
Mixture/Partition/Local Piecewise or cluster-specific parameters Medium-High Varies (often scalable)

4. Computational Scalability and Approximation Strategies

Non-stationary GPs historically suffered from scalability bottlenecks due to the O(N3)O(N^3) cost of dense covariance algebra. Recent advances enable tractable inference at scale:

  • Covariance Sparsity: Compactly supported kernels yield sparse matrices, supporting scalable, parallelizable inference via sparse factorizations and Krylov subspace methods—demonstrated on N106N \approx 10^6 (Risser et al., 7 Nov 2024).
  • Block-Diagonal Plus Low-Rank (BDLR) Approximations: Decompose the global covariance as a sum of local block-diagonal and low-rank (Nyström) terms, enabling fast stochastic estimation of gradients and Hessians, and second-order optimization for high-dimensional nonstationary parameterizations (Beckman et al., 2022).
  • Structured Kernel Interpolation (SKI) and Warping (warpSKI): Use warped, possibly non-equidistant, inducing point grids to exploit Toeplitz/Kronecker structure even for nonstationary phase behavior (Graßhoff et al., 2019).
  • Local/Partitioned Submodels: Partition or locally approximate the process via region- or neighborhood-specific GPs, dramatically reducing complexity and enabling distributed inference (Booth et al., 2023).
  • Variational and Inducing Point Methods: Deep kernel and DGP frameworks commonly rely on sparse variational methodologies and mini-batch stochastic optimization to address large-scale, nonstationary learning (James et al., 16 Jul 2025, Booth et al., 2023).

5. Interpretability, Diagnostics, and Model Selection

Non-stationary kernels enhance model expressivity but introduce parameterization and diagnostic challenges:

  • Interpretability: Parametric nonstationary kernels and modular local models retain interpretable structure, with explicit spatial dependence and regularization. Deep learning-based approaches and DGPs, despite offering flexibility, are generally less transparent.
  • Risk of Overfitting: As model complexity increases, particularly with flexible basis or deep architectures, rigorous cross-validation and regularization are necessary to mitigate overfitting and ensure identifiability.
  • Model Diagnostics: The appropriateness of a nonstationary approach can be evaluated via residual analysis, calibration of predictive intervals, and data-driven hyperparameter diagnostics as described in (Noack et al., 2023).
  • Recommendation: Begin with stationary kernels for exploratory modeling; escalate to nonstationary, parametric, or deep approaches when evidence of nonstationarity is strong or when uncertainty quantification is consequential.

6. Reference Implementations and Software Ecosystem

Multiple open-source libraries facilitate nonstationary GP modeling, with differing focus:

  • hetGP, tgp, laGP (R): Heteroskedastic, treed, and local approximate GPs (Booth et al., 2023).
  • deepgp, dgpsi (R, Python): Fully Bayesian DGPs, scalable via Vecchia approximation or elliptical slice sampling (Booth et al., 2023).
  • GPflux, GPyTorch (Python): Deep kernel and variational deep GP frameworks, supporting flexible kernel composition and auto-differentiation (James et al., 16 Jul 2025).
  • Software for compactly supported and sparse kernels: High-performance codes (often in C++/Python) exploiting distributed and GPU architectures for ultra-large data (Risser et al., 7 Nov 2024).

7. Future Directions and Open Challenges

Despite considerable advances, several challenges persist:

  • Identifiability and Over-parameterization: As nonstationary models scale in flexibility, identifiability of parameterizations and interpretability of results become more difficult. Hierarchical priors and Bayesian regularization are active areas of research (Risser et al., 7 Nov 2024).
  • Efficient High-Dimensional Nonstationary Inference: Extending current scalable methods to fully nonstationary, high-dimensional settings remains demanding. Research into efficient summary statistics, multi-resolution schemes, and hybrid architectures continues (Nychka et al., 2017, Beckman et al., 2022).
  • Diagnostics and Auto-tuning: There remains a need for robust diagnostics, automatic kernel selection, and scalable hyperparameter tuning workflows for practitioners (Noack et al., 2023).
  • Integration with Active Learning and Decision-making: Formal integration of nonstationary uncertainty quantification into downstream tasks—active learning, adaptive experiment design, robotic exploration—will continue to drive application domains (Chen et al., 2023, Patel et al., 2022).

In conclusion, non-stationary Gaussian processes constitute a powerful and rapidly developing domain at the intersection of probabilistic modeling, computational mathematics, and scientific computing. Ongoing innovation in kernel design, scalable inference, and integration with deep learning will continue to expand their applicability and relevance across scientific disciplines.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Non-Stationary Gaussian Processes.