State-Space Representation of GP Priors

Updated 8 January 2026

State-Space Representation of GP Priors is a method that transforms continuous Gaussian processes into linear, Markovian dynamical systems using stochastic differential equations.
The approach leverages finite-dimensional representations for kernels like Matérn, enabling efficient O(n) inference with Kalman filtering and reduced computational cost.
Extensions of the method address nonstationary, spatio-temporal, and deep hierarchical models, offering scalable and interpretable frameworks for real-time applications.

A state-space representation of a Gaussian process (GP) prior is a linear Markovian dynamical system—usually formalized as a stochastic differential equation (SDE), or, after temporal/spatial discretization, a vector autoregressive process—whose output process shares the covariance structure of the original GP. For stationary GP priors with rational spectral densities, the Markovian property can be made exact in finite (typically low) state dimension, yielding a canonical route to O(n) inference via Kalman filtering. State-space representations extend beyond stationary GPs to a wide class of nonstationary priors, spatio-temporal models, and deep hierarchical constructions, often with significant benefits in interpretability and computational efficiency.

1. Fundamentals of State-Space Representations for Gaussian Processes

State-space representations of GPs recast a stochastic process prior as a latent Markov process observed through linear measurements corrupted by Gaussian noise. The canonical continuous-time linear SDE form is: $\frac{d}{dt} f(t) = F(t) f(t) + L(t) w(t)$ with state vector $f(t) \in \mathbb{R}^m$ , time-dependent drift $F(t) \in \mathbb{R}^{m\times m}$ , diffusion $L(t) \in \mathbb{R}^{m \times 1}$ , and driving white noise process $w(t)$ of variance $q(t) \geq 0$ . The corresponding observation at time $t_k$ is

$y_k = H_k f(t_k) + \varepsilon_k$

where $H_k = H(t_k)$ , and $\varepsilon_k \sim \mathcal{N}(0, \sigma_v^2)$ (Benavoli et al., 2016).

This formulation fully characterizes the joint distribution over arbitrary measurement sets through the solution to the linear SDE, which leads directly to closed-form expressions for the predictive mean and covariance (Adam et al., 2020).

For GPs with stationary, rational spectral densities, such as Matérn kernels with half-integer smoothness parameter $\nu = p+1/2$ , the state-space form is exact and minimal. For example, the Matérn-1/2 (OU) process is represented with a 1D state, Matérn-3/2 with a 2D state, and Matérn-5/2 with a 3D state, with explicit formulae for $(F, L, H)$ as a function of the kernel hyperparameters (Sebastian et al., 24 Nov 2025, Zhao, 2021).

2. Kernel Classes and Explicit State-Space Mappings

The key property enabling finite-dimensional state-space conversion is that the GP prior's spectral density $S(\omega)$ is rational in $\omega^2$ . An explicit factorization $S(\omega)=G(i\omega)G(-i\omega)^*$ admits a transfer function realization

$G(s) = H(sI - F)^{-1} L$

so that the process output exactly matches the original GP covariance (Zhao et al., 2020).

Key kernel classes with explicit state-space mappings include:

Matérn kernels with half-integer $\nu$ : finite-dimensional, minimal representations; see explicit matrices for $\nu = 1/2, 3/2, 5/2$ in (Adam et al., 2020, Sebastian et al., 24 Nov 2025, Zhao, 2021).
Damped simple harmonic oscillator (DSHO) kernels: 2D state representing underdamped oscillatory GPs (Jordán et al., 2021).
Polynomial and cubic spline kernels: as degenerate cases with deterministic SDEs or simple driving white noise (Benavoli et al., 2016).
Periodic and neural network kernels: using block-diagonal or time-varying (LTV/LTI) structures for the drift and observation matrices (Benavoli et al., 2016).

Nonstationary or non-rational-spectral-density kernels (e.g., the squared-exponential) admit only approximate finite state-space realizations via truncation, Taylor expansion, or spectral projection (Benavoli et al., 2016, Svensson et al., 2015).

3. Computational Implications and Inference Algorithms

State-space GPs provide a profound computational advantage. Discretizing the SDE yields a state-evolution recursion: $x_{k+1} = A_k x_k + w_k$

$y_k = H_k x_k + \varepsilon_k$

with $A_k = \exp(F \Delta t)$ and process noise covariance $Q_k$ computed exactly via matrix integrals. For time-invariant $F, L$ , $A$ and $Q$ can be precomputed (Adam et al., 2020).

Inference (filtering and smoothing) is performed via Kalman filter and Rauch-Tung-Striebel (RTS) smoother recursions, with per-step complexity $O(m^3)$ and total complexity $O(n m^3)$ (often $O(n m^2)$ for sparse operations), contrasted with the $O(n^3)$ scaling for standard GP inference (Sebastian et al., 24 Nov 2025, Zhao, 2021).

The state-space formulation is amenable to modern scalable variational inference via block-banded precisions (enabling “doubly sparse” approximations), sparse-inducing points, and efficient natural gradient updates (Adam et al., 2020).

4. Extensions: Nonstationarity, Spatio-Temporal, and Basis Expansions

Nonstationary GP priors are addressed by embedding transient terms (effect of initial state) and explicit time-varying matrices $F(t), L(t), H(t)$ , and allowing for nontrivial initial covariance $\Sigma_0$ (Benavoli et al., 2016). Canonical examples include polynomial regression priors, periodic kernels (via sums of cos/sin modes), and time-localized Gaussian kernels. For the squared exponential, rational function approximation of the inverse spectral density enables approximate finite-dimensional realization, with explicit error quantification (Benavoli et al., 2016, Svensson et al., 2015).

Spatio-temporal GPs with convolution-generated or SPDE-based priors are formalized through infinite-dimensional SDEs. Galerkin projection onto finite orthonormal spatial bases yields a large but finite-dimensional linear state-space system, whose process noise statistics and operator structure match the original GP at the respective approximation level. The covariance error decays as the tail of the Karhunen-Loève spectrum of the kernel (Zhang et al., 1 Dec 2025, Svensson et al., 2015).

Basis-expansion models (e.g., truncated Mercer or Karhunen-Loève), where the GP is projected onto the span of leading eigenfunctions, provide an interpretable and flexible yet computable state-space surrogate. The induced prior on basis coefficients is Gaussian, matching the kernel's spectral density at corresponding eigenvalues (Svensson et al., 2015, Svensson et al., 2016).

Kernel Type	State Dimension	State-Space Construction
Matérn- $\frac{1}{2}$	1	1D OU process, LTI SDE
Matérn- $\frac{3}{2}$	2	2D LTI SDE, see explicit $F$ , $L$ in (Adam et al., 2020)
Matérn- $\frac{5}{2}$	3	3D LTI SDE, higher-order derivatives
SE/Exp. rational approx	$d$ (var.)	Taylor-truncated LTI
DSHO/Periodic	2 (per freq.)	Block-diagonal LTI
Infinite-dimensional (SPDE)	$\infty$	SDE projected via Galerkin (truncated to $N$ )

5. Hierarchical and Deep Gaussian Processes in State Space

State-space methods extend to deep and hierarchical models by stacking GP priors whose hyperparameters (length-scales, amplitudes) are themselves outputs of other state-space GPs. This yields a nonlinear, hierarchical SDE system, where each layer depends on its parents’ realization, and the overall composite is again Markovian in a high-dimensional state (Zhao et al., 2020, Zhao, 2021).

Inference for deep state-space GPs leverages the Markovian structure, employing extended/cubature Kalman filters or particle methods as needed for nonlinear transitions. This approach allows for efficient O(n) smoothing in models intractable to classical DGP inference.

6. Practical Considerations and Limitations

The state-space approach relies on the existence or tractability of the SDE realization for the target kernel. For Matérn kernels and close rational-spectral-density variants, this yields practical finite-dimensional systems. For kernels without a rational factorization (e.g., squared exponential), only approximate state-space representations are available, often with a trade-off between approximation error and computational cost (Benavoli et al., 2016).

Truncated basis projections or Galerkin methods introduce finite-dimensional error that decays with the number of basis terms. Choice of basis (Fourier, Laplace, finite element) impacts computational efficiency, especially for complex geometries in space (Zhang et al., 1 Dec 2025, Svensson et al., 2015).

All state-space systems constructed this way are linear-Gaussian, making the approach most natural for priors and likelihoods compatible with these assumptions. Highly nonlinear or non-Gaussian extensions require additional approximation.

7. Applications and Impact

State-space representations of GP priors have substantially extended the range of GPs amenable to large-scale inference, especially for temporal, spatio-temporal, and hierarchical problems in fields such as time series analysis, control, signal processing, and spatiotemporal environmental modelling (Jordán et al., 2021, Zhang et al., 1 Dec 2025). The ability to exploit the Markov property and block-banded precision structures yields scalable algorithms for streaming, real-time, and massive-data scenarios, while offering a link to physically meaningful dynamical models (Adam et al., 2020). These techniques underpin a new generation of efficient, flexible, and interpretable probabilistic modelling frameworks.