Transport Gaussian Processes Overview

Updated 16 April 2026

Transport Gaussian Processes are stochastic process models that extend classical GPs using transport and push-forward operations to capture complex dependencies.
They integrate optimal transport, normalizing flows, and measure-theoretic methods to enforce non-Gaussian marginals and physical constraints.
TGPs are applied in regression, spatiotemporal modeling, and PDE-constrained systems, offering scalable and interpretable inference frameworks.

Transport Gaussian Processes (TGPs) constitute a family of stochastic process models that generalize classical Gaussian processes (GPs) by incorporating transport, optimal transport, or push-forward operations. These methodologies exploit measure-theoretic and functional-analytic structures to model complex data dependencies, encode physical constraints (such as advection or mass-conservation), address non-Gaussian marginal and copula structures, and enable flexible kernel constructions over distributional, functional, or spatiotemporal inputs. TGPs subsume and extend models such as warped GPs, Student-t processes, GPs on distributions via optimal transport, and stochastic processes built by normalizing flows.

1. Core Frameworks for Transport of Gaussian Processes

1.1 Push-Forward Stochastic Processes

Transport or push-forward stochastic process constructions proceed by taking a base process, typically a white-noise Gaussian process $\xi=\{\xi(t)\}$ , and applying a measurable, often invertible, transformation $T$ . For finite collections, $T_n:\mathbb{R}^n\to\mathbb{R}^n$ are consistent maps such that the induced process $f=T(\xi)$ has finite-dimensional laws $\pi_n = T_n\#\eta_n$ , where $\eta_n$ is the law of the standard $n$ -variate Gaussian. If Kolmogorov consistency is satisfied, the push-forward defines a valid process. This framework admits modular layerwise constructions, wherein each layer transforms a specific distributional property, such as marginals or dependency structure (Rios, 2020). The resulting "Transport Process" (TP) can represent a broad family of non-Gaussian, copula-rich priors including but not limited to GPs, warped GPs, Student-t processes, and processes with Archimedean copulas.

1.2 Transport via Normalizing Flows

Transported or transformed GPs via normalizing flows equip the GP prior with expressive, invertible transformations $T:\mathbb{R}\to\mathbb{R}$ (possibly input-dependent, e.g., parameterized by neural networks), yielding processes $f(x) = T(g(x);\phi)$ , with $g\sim\text{GP}$ (2011.01596). Because the push-forward is invertible, exact likelihoods are available via change-of-variables, and inference remains tractable via stochastic variational methods with sparse inducing points. This construction encodes interpretable structure (e.g., monotonicity, boundedness) by suitable design of $T$ 0, and generalizes hierarchical non-Gaussian process models (2011.01596, Rios, 2020).

2. Optimal Transport and Gaussian Processes on Distributions

2.1 Kernels on Distribution Space via OT

In tasks with distribution-valued or measure-valued inputs, kernels based on optimal transport provide a principled framework. A key approach is to select a reference measure $T$ 1 (typically a Wasserstein barycenter), construct Monge maps $T$ 2 transporting input measures $T$ 3 to $T$ 4, and define Hilbertian embeddings $T$ 5. Any radial positive definite kernel $T$ 6, where $T$ 7, yields a valid covariance kernel on $T$ 8 (Bachoc et al., 2018). Schoenberg’s theorem ensures that completely monotone radial functions yield strictly p.d. kernels on Hilbert space, including RBF, Matérn, and power-exponential families. For multivariate Gaussian measures, Monge maps have explicit forms via principal matrix square-roots, making the induced kernel a function of Frobenius norms of transforms of covariance matrices (Bachoc et al., 2018).

2.2 Sinkhorn Regularized Gaussian Process Kernels

Regularized OT kernels over probability measures leverage Sinkhorn potentials as Hilbertian embeddings. Given an entropic-regularized OT cost $T$ 9 with reference measure $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 0, optimal dual potentials $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 1 yield a centered embedding $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 2, from which kernels $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 3 are defined. The induced GPs index distributions and yield universality and strict positive-definiteness under mild assumptions; computationally, Sinkhorn iterations admit automatic differentiation and scale to large datasets (Bachoc et al., 2022).

2.3 Barycenters and Wasserstein Geometry of GPs

Wasserstein barycenters and optimal transport geometry facilitate aggregation of predictive distributions and analysis of process interpolants. In particular, weighted Wasserstein barycenters of Gaussian experts enable robust aggregation in product-of-expert frameworks, modeling the barycenter variance and mean as weighted sums (Cohen et al., 2021). Operator-theoretic frameworks generalize the construction of optimal Monge maps, barycenters, and geodesics for degenerate and infinite-dimensional Gaussian measures, leveraging Green’s functions and the Bures–Wasserstein metric (Yun et al., 25 Dec 2025).

3. Transport GPs for Regression, Non-Gaussianity, and Physical Modeling

3.1 Expressiveness beyond Gaussianity

The layered transport framework enables the explicit modeling of non-Gaussian marginals, copulas, heavy tails, boundedness constraints, and tail dependence (Rios, 2020). The core layers include:

Marginal warping (Box-Cox, log, affine): enforces support constraints and shapes marginals.
Covariance (kernel) transformation: imposes arbitrary correlation structure.
Elliptical layers (radial mixture): model thickness of tails, e.g., Student-t process models.
Archimedean (ℓ1-radial) layers: construct copulas with arbitrary upper/lower tail dependence. Inference exploits the invertibility and triangularity of transport layers for exact or efficient posterior sampling, leveraging the structure of the Gaussian base process.

3.2 Transport GPs for Trajectory and Spatiotemporal Modeling

Classical GP models for observed trajectories—such as positions over time or spatiotemporal fields—can exploit transport-inspired kernels to encode dynamics, e.g., advection, drift, and periodicity (Nguyen et al., 2021). The choice of mean functions and composite spatial-temporal kernels is critical in transport applications, and sparse approximations (inducing points, low-rank structure) enable scalability (Nguyen et al., 2021).

3.3 PDE-Constrained TGPs and Flow Learning

Transport GPs parametrize physical flows in systems governed by conservation laws. For example, modeling a scalar field $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 4 advected by unknown $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 5, one imposes covariance on the transported field $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 6 via backward flow maps $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 7, such that covariance between $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 8 and $T_n:\mathbb{R}^n\to\mathbb{R}^n$ 9 is expressed through $f=T(\xi)$ 0 and $f=T(\xi)$ 1. Neural-parametric $f=T(\xi)$ 2 (e.g., residual networks) admit end-to-end likelihood maximization for flow inference. The resulting framework simultaneously learns hyperparameters and latent velocities, enforcing physical plausibility and efficiently scaling to geophysical remote sensing settings (Fahmy et al., 16 May 2025).

Approach	Key Properties	Reference
Layered push-forward (TP)	Warped, elliptical, Archimedean, etc.	(Rios, 2020)
GP + Normalizing Flow	Input-dependent invertible map $f=T(\xi)$ 3	(2011.01596)
OT kernel on measures	$f=T(\xi)$ 4 embedding via Monge/Sinkhorn	(Bachoc et al., 2018, Bachoc et al., 2022)
PDE-inspired TGP	GP on advected/composed domain	(Fahmy et al., 16 May 2025)
OT barycenter aggregation	Wasserstein barycenter of GP experts	(Cohen et al., 2021)

4. Optimal Transport between Gaussian Processes and Covariances

4.1 Bures-Wasserstein Distance and Monge Maps

The 2-Wasserstein (Bures–Wasserstein) distance between (possibly infinite-dimensional, degenerate) Gaussians admits explicit closed forms in terms of means and covariances:

$f=T(\xi)$ 5

Recent advances yield existence and explicit characterizations of optimal Monge pushforwards even in singular settings by operator-theoretic factorization, Green's operators, and Schur complements (Yun et al., 25 Dec 2025). Interpolants (McCann geodesics) and barycenters admit explicit forms via these constructions.

4.2 Adapted and Entropic OT Distances for Processes

For discrete-time multivariate processes, adapted causal transport distances (AW $f=T(\xi)$ 6) impose bicausality on couplings, yielding closed-form adapted Bures–Wasserstein distances involving Cholesky factors and their diagonals (Gunasingam et al., 2024). Entropic regularizations—e.g., Sinkhorn divergences—interpolate between 2-Wasserstein and MMD, yielding twice Fréchet-differentiable functionals for use in infinite-dimensional settings and kernel learning (Quang, 2020).

4.3 Spectral OT for Stationary Processes

For stationary vector-valued processes, optimal transport admits a spectral formulation: the cost is the infimum of the variance of a filtered discrepancy process, leading to a weighted Hellinger distance between power spectral densities (Zorzi, 2020). Explicit formulas and spectral estimators extend this distance to indirect observations under BIBO-stable linear filtering.

5. Computation, Scalability, and Statistical Properties

5.1 Inference and Algorithmic Pipeline

Training TGPs generally proceeds by maximizing marginal log-likelihood, which involves evaluating densities under the transported process and Jacobian penalties for invertible layers or flows. For deep or parametric flows, variational inference with inducing points and stochastic gradients is effective, retaining $f=T(\xi)$ 7 scaling (2011.01596). For distribution-valued inputs, one precomputes Hilbertian embeddings (e.g., Sinkhorn potentials or Monge maps to a barycenter), accelerating kernel evaluations and allowing for batching and parallelization (Bachoc et al., 2022, Bachoc et al., 2018).

5.2 Statistical Consistency and Microergodicity

Radial kernels over Wasserstein/Hilbertian embeddings preserve microergodicity in infinite-dimensional settings, enabling identification of covariance hyperparameters with fixed-domain asymptotics (Bachoc et al., 2018). Empirical consistency under kernel estimation is established under mild convergence of barycenter and transport transforms (Bachoc et al., 2018). TGPs constructed via transport also inherit properties such as identifiability, modular interpretability, and physically plausible extrapolation, provided the transformations are invertible and regular.

5.3 Empirical Results and Application Domains

TGPs outperform or complement classical methods in settings requiring nonlinear marginals, robustness to heavy tails, boundedness, or complex dependence—exemplified by wind field estimation from satellite imagery, robust time series regression, and classification on distribution-valued inputs (Fahmy et al., 16 May 2025, Bachoc et al., 2022, Cohen et al., 2021). The barycentric aggregation of expert GPs reduces uncertainty miscalibration relative to precision-based product-of-experts, especially near regime boundaries (Cohen et al., 2021).

6. Extensions, Limitations, and Open Questions

Extensions of TGPs include:

Multivariate and multitask generalizations (multiplexed transported processes, vector-valued barycenters).
Integration with explicit physical constraints (e.g., divergence-free or PDE-constrained layers).
Online and streaming variants for real-time data assimilation.
Theoretical analysis of identifiability, sample complexity, and asymptotics under increasing data volume and layer depth.

Current limitations are dictated by invertibility and smoothness constraints for flows, computational cost of large matrix or operator computations, and the challenge of quantifying uncertainty propagation through deep transformations.

A plausible implication is that further advances in scalable computation of high-dimensional OT, advances in flow parameterization (e.g., from deep learning), and detailed study of identifiability in physically constrained settings will drive the next generation of transport-centric stochastic process models.

References:

Markdown Report Issue Upgrade to Chat

References (11)

Transport Gaussian Processes for Regression (2020)

Transforming Gaussian Processes With Normalizing Flows (2020)

Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding (2018)

Gaussian Processes on Distributions based on Regularized Optimal Transport (2022)

Healing Products of Gaussian Processes (2021)

Gaussian Optimal Transport Beyond Brenier's Theorem (2025)

Gaussian Process for Trajectories (2021)

Estimating Velocity Vector Fields of Atmospheric Winds using Transport Gaussian Processes (2025)

Adapted optimal transport between Gaussian processes in discrete time (2024)

10.

Entropic regularization of Wasserstein distance between infinite-dimensional Gaussian measures and Gaussian processes (2020)

11.

Optimal Transport between Gaussian Stationary Processes (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Transport Gaussian Processes.

Transport Gaussian Processes Overview

1. Core Frameworks for Transport of Gaussian Processes

1.1 Push-Forward Stochastic Processes

1.2 Transport via Normalizing Flows

2. Optimal Transport and Gaussian Processes on Distributions

2.1 Kernels on Distribution Space via OT

2.2 Sinkhorn Regularized Gaussian Process Kernels

2.3 Barycenters and Wasserstein Geometry of GPs

3. Transport GPs for Regression, Non-Gaussianity, and Physical Modeling

3.1 Expressiveness beyond Gaussianity

3.2 Transport GPs for Trajectory and Spatiotemporal Modeling

3.3 PDE-Constrained TGPs and Flow Learning

4. Optimal Transport between Gaussian Processes and Covariances

4.1 Bures-Wasserstein Distance and Monge Maps

4.2 Adapted and Entropic OT Distances for Processes

4.3 Spectral OT for Stationary Processes

5. Computation, Scalability, and Statistical Properties

5.1 Inference and Algorithmic Pipeline

5.2 Statistical Consistency and Microergodicity

5.3 Empirical Results and Application Domains

6. Extensions, Limitations, and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Transport Gaussian Processes Overview

1. Core Frameworks for Transport of Gaussian Processes

1.1 Push-Forward Stochastic Processes

1.2 Transport via Normalizing Flows

2. Optimal Transport and Gaussian Processes on Distributions

2.1 Kernels on Distribution Space via OT

2.2 Sinkhorn Regularized Gaussian Process Kernels

2.3 Barycenters and Wasserstein Geometry of GPs

3. Transport GPs for Regression, Non-Gaussianity, and Physical Modeling

3.1 Expressiveness beyond Gaussianity

3.2 Transport GPs for Trajectory and Spatiotemporal Modeling

3.3 PDE-Constrained TGPs and Flow Learning

4. Optimal Transport between Gaussian Processes and Covariances

4.1 Bures-Wasserstein Distance and Monge Maps

4.2 Adapted and Entropic OT Distances for Processes

4.3 Spectral OT for Stationary Processes

5. Computation, Scalability, and Statistical Properties

5.1 Inference and Algorithmic Pipeline

5.2 Statistical Consistency and Microergodicity

5.3 Empirical Results and Application Domains

6. Extensions, Limitations, and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research