Papers
Topics
Authors
Recent
Search
2000 character limit reached

Skew Gaussian Processes

Updated 16 April 2026
  • Skew Gaussian Processes are statistical models that extend standard Gaussian processes by incorporating explicit transformation maps to capture skewed, non-Gaussian behavior.
  • They utilize layered transport mechanisms such as marginal warping, covariance mixing, and normalizing flows to achieve flexible dependency structures and heavy-tailed distributions.
  • These models are applied in regression, spatiotemporal dynamics, and physical advection, offering scalable inference, calibrated uncertainty, and improved expressiveness.

A Transport Gaussian Process (TGP) is any stochastic process constructed by pushing forward a "base" law—typically a standard Gaussian process or Gaussian white-noise process—through a sequence of explicit, often invertible transformations ("transport maps") acting on the finite-dimensional marginals or trajectories. This paradigm unifies standard GPs, warped GPs, Student-t processes, and a broad family of models with non-Gaussian marginals and complex dependency structures. In parallel, "transport" in TGPs also refers to the modeling of physical transport phenomena, such as advection by a latent velocity field, by parametrizing nonstationary or SPDE-constrained covariance structures. The methodologies reviewed below encompass both probabilistic transport in function space and explicit modeling of advection or optimal transport in probability space.

1. Construction Frameworks for Transport Gaussian Processes

Two mathematically rigorous mechanisms have been proposed for defining TGPs:

  • Push-forward framework for stochastic processes: Given a white-noise process ξ={ξ(t)}tT\xi = \{\xi(t)\}_{t\in T} (with finite-dimensional laws Nn(0,In)N_n(0, I_n)), and an η\eta-consistent collection of measurable transport maps Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n (where nn is the number of input points), define f=T(ξ)f = T(\xi) with finite-dimensional law πn=Tn#ηn\pi_n = T_n\#\eta_n. Kolmogorov consistency ensures the resulting ff is a well-defined stochastic process. The maps TT can be composed to impart desired marginal, copula, or dependence structures, e.g., through marginal warping, covariance mixing, radial / copula layers (elliptical, Archimedean), or their combinations (Rios, 2020).
  • Normalizing flow formulation: A base GP g(x)GP(μ(x),k(x,x))g(x)\sim GP(\mu(x), k(x,x')) is "transported" through an invertible flow Nn(0,In)N_n(0, I_n)0 (possibly input-dependent, e.g., parametrized by a neural network), yielding Nn(0,In)N_n(0, I_n)1. The marginal prior Nn(0,In)N_n(0, I_n)2 is computed by change-of-variables: Nn(0,In)N_n(0, I_n)3 (2011.01596). This approach permits imposing constraints (boundedness, monotonicity, nonstationarity) while maintaining differentiability and tractability.

Both paradigms allow the induced process to retain key invariance, closure, or support properties provided by the elementary layers. Each layer can be explicitly characterized: marginal transformations (e.g., Box–Cox), linear mixing (i.e., Cholesky or covariance layer), radial transformations for elliptical or Archimedean copulas, and complex flows for normalizing-flow GP variants.

2. Transport GPs for Regression: Expressiveness and Inference

Transport Gaussian Processes for Regression define a modular architecture where each layer in the transport composition modulates a distinct statistical property:

  • Marginal layers introduce location shift, monotonic warping, or non-Gaussianity for individual Nn(0,In)N_n(0, I_n)4 marginals.
  • Covariance layers (linear mixing) effect arbitrary covariance kernel structures.
  • Elliptical/radial and Archimedean layers introduce heavy-tails (Student-t, inverse Gamma) or extreme-value dependence (Clayton, Gumbel copulas), controlling the copula structure of joint laws (Rios, 2020).

The induced process law for Nn(0,In)N_n(0, I_n)5 at locations Nn(0,In)N_n(0, I_n)6 is

Nn(0,In)N_n(0, I_n)7

where Nn(0,In)N_n(0, I_n)8 and Nn(0,In)N_n(0, I_n)9 is the standard normal density.

Learning is performed by maximizing the marginal likelihood or its penalized version, with gradients accumulated layerwise (each invertible, tractable). For fully triangular transports, posterior predictive draws decompose as transformations of GP conditional samples; explicit posterior mean and credible intervals are available for warping/covariance layers, while general compositions allow for straightforward MCMC due to explicit densities (Rios, 2020). Empirical results confirm substantial improvements in handling boundedness, heavy tails, and tail dependence over classical or warped GPs.

3. Transport GPs in Spatiotemporal Modeling and Physical Advection

TGPs have been advanced as a rigorous framework for modeling the movement of scalar (and vector) fields undergoing advection:

  • Physics-constrained TGPs: Model a scalar field η\eta0 as observed under the action of a time-dependent velocity field η\eta1, with the transport equation

η\eta2

relaxed to allow for intrinsic temporal evolution. The covariance structure is defined by

η\eta3

where η\eta4 is a learned invertible "backward flow" parameterized via residual neural networks, ensuring bijectivity and tractable differentials. The mean and hyperparameters are fitted via joint maximum likelihood, and the velocity recovered analytically: η\eta5 (Fahmy et al., 16 May 2025).

This formulation delivers physically meaningful, coherent velocity fields for large-scale satellite data, outperforming conventional local tracking in terms of smoothness, spatial coverage, and empirical error (Fahmy et al., 16 May 2025).

4. Optimal Transport, Barycenters, and Distributional Inputs

Optimal transport theory supports several TGP-related constructions:

  • Gaussian process barycenters (ensemble aggregation): By interpreting each local GP expert's predictive distribution as a measure, product-of-expert models can be improved using the 2-Wasserstein barycenter. For 1D Gaussians, the barycenter of η\eta6 experts (means η\eta7, variances η\eta8) is the unique minimizer of

η\eta9

with closed-form Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n0, Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n1, and softmax-based confidence weights Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n2 calibrating contributions (Cohen et al., 2021).

  • Kernels on probability measures via entropic OT: Entropic Sinkhorn-regularized costs between pairs Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n3 define Hilbertian embeddings Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n4 in Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n5 (reference probability space), leading to positive definite radial kernels Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n6 for GPs indexed by (empirical or true) distributions (Bachoc et al., 2022). Such kernels are universal on the weak topology, strictly positive definite under mild conditions, and admit efficient computation via Sinkhorn iterations and automatic differentiation.
  • Multivariate distributional input kernels: For Gaussian distributions with known covariances, composition with the Wasserstein barycenter and explicit transport maps produces strictly positive definite GP kernels on Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n7 via Hilbert space embeddings (Bachoc et al., 2018). These admit microergodicity of all parametric family parameters in infinite dimension, permitting consistent model selection.

5. Optimal Transport Distances Between Gaussian Processes

Computation and geometry of optimal transport distances between Gaussian (process) laws are central in model-based comparison, calibration, and barycenter construction:

  • Finite/infinite-dimensional Bures-Wasserstein distance: For Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n8-dimensional (possibly degenerate) Gaussian laws Tn:RnRnT_n : \mathbb{R}^n\to\mathbb{R}^n9, nn0, the squared nn1-Wasserstein (Bures) distance is

nn2

with optimal transport maps characterized via operator means even in infinite-dimensional / degenerate settings (Yun et al., 25 Dec 2025). Operator-theoretic factorization (Green’s operator, Schur complement) yields closed-form generalized Monge couplings, and interpolated barycenters correspond to convex hulls in operator space.

nn6

where nn7, nn8 are Cholesky factors. This metric explicitly enforces time-causal coupling restrictions, with efficient nn9 computation (Gunasingam et al., 2024). The construction elucidates differences with classical f=T(ξ)f = T(\xi)0 and provides closed-form bicausal OT maps.

  • Entropic OT and Sinkhorn diverences for GPs: In Hilbert spaces, the regularized OT between Gaussian measures admits closed-form optimal couplings and costs involving trace and Fredholm determinant terms. Differentiability and unique barycenters hold under broad conditions, with limiting behaviors interpolating between f=T(ξ)f = T(\xi)1 and maximum mean discrepancy (MMD) (Quang, 2020).

6. Computational and Statistical Implications

Transport GP frameworks maintain, or in some cases reduce, computational complexity compared to standard GPs when leveraging structure:

  • Scalability: Sparse/inducing-point approximations, low-rank kernel expansions, and stochastic mini-batch training techniques are fully compatible with most transport and flow-based TGP formulations (2011.01596, Fahmy et al., 16 May 2025).
  • Optimization: Layerwise or fully automatic differentiation is available for both flow parameters and GP (hyper)parameters due to explicit density computations, facilitating gradient-based optimization even through Sinkhorn iterations or neural ODEs (Bachoc et al., 2022, Fahmy et al., 16 May 2025).
  • Uncertainty Quantification: Transport GPs preserve or enhance calibrated uncertainty, especially when using barycenter or copula-based models. In distributed settings, transport-based aggregation provides robust, smooth, and more reliable predictive variances over classical PoE-type models (Cohen et al., 2021).
  • Statistical consistency: Microergodicity of hyperparameters in Hilbert-space-embedded OT kernels ensures parameter identifiability and consistency (Bachoc et al., 2018).

7. Applications and Theoretical Significance

Transport GPs have achieved state-of-the-art results in several domains:

  • Physical field interpolation and dynamics: Wind field retrieval from satellite imagery, where TGPs offer smooth, physically plausible vector fields even under poor feature contrast (Fahmy et al., 16 May 2025).
  • Machine learning on distributions: Classification, regression, and structured prediction where inputs are distributions, sets, or textures, leveraging OT-based kernels (Bachoc et al., 2022, Bachoc et al., 2018).
  • Heavy-tailed and bounded regression: Financial series, environmental phenomena, and biological signals requiring processes that depart from Gaussian assumptions in marginal or dependency structure (Rios, 2020).
  • Ensemble modeling and federated learning: Efficient, robust product-of-expert models via Wasserstein barycenter aggregation (Cohen et al., 2021).

The general strategy of transporting GPs via analytically or algorithmically tractable maps synthesizes advances in kernel methods, normalizing flows, optimal transport, and machine learning for distributions, achieving models with interpretable structure, improved expressiveness, and scalable inference.


Key Citations:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Skew Gaussian Processes.