Transport Gaussian Processes Overview
- Transport Gaussian Processes are stochastic process models that extend classical GPs using transport and push-forward operations to capture complex dependencies.
- They integrate optimal transport, normalizing flows, and measure-theoretic methods to enforce non-Gaussian marginals and physical constraints.
- TGPs are applied in regression, spatiotemporal modeling, and PDE-constrained systems, offering scalable and interpretable inference frameworks.
Transport Gaussian Processes (TGPs) constitute a family of stochastic process models that generalize classical Gaussian processes (GPs) by incorporating transport, optimal transport, or push-forward operations. These methodologies exploit measure-theoretic and functional-analytic structures to model complex data dependencies, encode physical constraints (such as advection or mass-conservation), address non-Gaussian marginal and copula structures, and enable flexible kernel constructions over distributional, functional, or spatiotemporal inputs. TGPs subsume and extend models such as warped GPs, Student-t processes, GPs on distributions via optimal transport, and stochastic processes built by normalizing flows.
1. Core Frameworks for Transport of Gaussian Processes
1.1 Push-Forward Stochastic Processes
Transport or push-forward stochastic process constructions proceed by taking a base process, typically a white-noise Gaussian process , and applying a measurable, often invertible, transformation . For finite collections, are consistent maps such that the induced process has finite-dimensional laws , where is the law of the standard -variate Gaussian. If Kolmogorov consistency is satisfied, the push-forward defines a valid process. This framework admits modular layerwise constructions, wherein each layer transforms a specific distributional property, such as marginals or dependency structure (Rios, 2020). The resulting "Transport Process" (TP) can represent a broad family of non-Gaussian, copula-rich priors including but not limited to GPs, warped GPs, Student-t processes, and processes with Archimedean copulas.
1.2 Transport via Normalizing Flows
Transported or transformed GPs via normalizing flows equip the GP prior with expressive, invertible transformations (possibly input-dependent, e.g., parameterized by neural networks), yielding processes , with (2011.01596). Because the push-forward is invertible, exact likelihoods are available via change-of-variables, and inference remains tractable via stochastic variational methods with sparse inducing points. This construction encodes interpretable structure (e.g., monotonicity, boundedness) by suitable design of 0, and generalizes hierarchical non-Gaussian process models (2011.01596, Rios, 2020).
2. Optimal Transport and Gaussian Processes on Distributions
2.1 Kernels on Distribution Space via OT
In tasks with distribution-valued or measure-valued inputs, kernels based on optimal transport provide a principled framework. A key approach is to select a reference measure 1 (typically a Wasserstein barycenter), construct Monge maps 2 transporting input measures 3 to 4, and define Hilbertian embeddings 5. Any radial positive definite kernel 6, where 7, yields a valid covariance kernel on 8 (Bachoc et al., 2018). Schoenberg’s theorem ensures that completely monotone radial functions yield strictly p.d. kernels on Hilbert space, including RBF, Matérn, and power-exponential families. For multivariate Gaussian measures, Monge maps have explicit forms via principal matrix square-roots, making the induced kernel a function of Frobenius norms of transforms of covariance matrices (Bachoc et al., 2018).
2.2 Sinkhorn Regularized Gaussian Process Kernels
Regularized OT kernels over probability measures leverage Sinkhorn potentials as Hilbertian embeddings. Given an entropic-regularized OT cost 9 with reference measure 0, optimal dual potentials 1 yield a centered embedding 2, from which kernels 3 are defined. The induced GPs index distributions and yield universality and strict positive-definiteness under mild assumptions; computationally, Sinkhorn iterations admit automatic differentiation and scale to large datasets (Bachoc et al., 2022).
2.3 Barycenters and Wasserstein Geometry of GPs
Wasserstein barycenters and optimal transport geometry facilitate aggregation of predictive distributions and analysis of process interpolants. In particular, weighted Wasserstein barycenters of Gaussian experts enable robust aggregation in product-of-expert frameworks, modeling the barycenter variance and mean as weighted sums (Cohen et al., 2021). Operator-theoretic frameworks generalize the construction of optimal Monge maps, barycenters, and geodesics for degenerate and infinite-dimensional Gaussian measures, leveraging Green’s functions and the Bures–Wasserstein metric (Yun et al., 25 Dec 2025).
3. Transport GPs for Regression, Non-Gaussianity, and Physical Modeling
3.1 Expressiveness beyond Gaussianity
The layered transport framework enables the explicit modeling of non-Gaussian marginals, copulas, heavy tails, boundedness constraints, and tail dependence (Rios, 2020). The core layers include:
- Marginal warping (Box-Cox, log, affine): enforces support constraints and shapes marginals.
- Covariance (kernel) transformation: imposes arbitrary correlation structure.
- Elliptical layers (radial mixture): model thickness of tails, e.g., Student-t process models.
- Archimedean (ℓ1-radial) layers: construct copulas with arbitrary upper/lower tail dependence. Inference exploits the invertibility and triangularity of transport layers for exact or efficient posterior sampling, leveraging the structure of the Gaussian base process.
3.2 Transport GPs for Trajectory and Spatiotemporal Modeling
Classical GP models for observed trajectories—such as positions over time or spatiotemporal fields—can exploit transport-inspired kernels to encode dynamics, e.g., advection, drift, and periodicity (Nguyen et al., 2021). The choice of mean functions and composite spatial-temporal kernels is critical in transport applications, and sparse approximations (inducing points, low-rank structure) enable scalability (Nguyen et al., 2021).
3.3 PDE-Constrained TGPs and Flow Learning
Transport GPs parametrize physical flows in systems governed by conservation laws. For example, modeling a scalar field 4 advected by unknown 5, one imposes covariance on the transported field 6 via backward flow maps 7, such that covariance between 8 and 9 is expressed through 0 and 1. Neural-parametric 2 (e.g., residual networks) admit end-to-end likelihood maximization for flow inference. The resulting framework simultaneously learns hyperparameters and latent velocities, enforcing physical plausibility and efficiently scaling to geophysical remote sensing settings (Fahmy et al., 16 May 2025).
| Approach | Key Properties | Reference |
|---|---|---|
| Layered push-forward (TP) | Warped, elliptical, Archimedean, etc. | (Rios, 2020) |
| GP + Normalizing Flow | Input-dependent invertible map 3 | (2011.01596) |
| OT kernel on measures | 4 embedding via Monge/Sinkhorn | (Bachoc et al., 2018, Bachoc et al., 2022) |
| PDE-inspired TGP | GP on advected/composed domain | (Fahmy et al., 16 May 2025) |
| OT barycenter aggregation | Wasserstein barycenter of GP experts | (Cohen et al., 2021) |
4. Optimal Transport between Gaussian Processes and Covariances
4.1 Bures-Wasserstein Distance and Monge Maps
The 2-Wasserstein (Bures–Wasserstein) distance between (possibly infinite-dimensional, degenerate) Gaussians admits explicit closed forms in terms of means and covariances:
5
Recent advances yield existence and explicit characterizations of optimal Monge pushforwards even in singular settings by operator-theoretic factorization, Green's operators, and Schur complements (Yun et al., 25 Dec 2025). Interpolants (McCann geodesics) and barycenters admit explicit forms via these constructions.
4.2 Adapted and Entropic OT Distances for Processes
For discrete-time multivariate processes, adapted causal transport distances (AW6) impose bicausality on couplings, yielding closed-form adapted Bures–Wasserstein distances involving Cholesky factors and their diagonals (Gunasingam et al., 2024). Entropic regularizations—e.g., Sinkhorn divergences—interpolate between 2-Wasserstein and MMD, yielding twice Fréchet-differentiable functionals for use in infinite-dimensional settings and kernel learning (Quang, 2020).
4.3 Spectral OT for Stationary Processes
For stationary vector-valued processes, optimal transport admits a spectral formulation: the cost is the infimum of the variance of a filtered discrepancy process, leading to a weighted Hellinger distance between power spectral densities (Zorzi, 2020). Explicit formulas and spectral estimators extend this distance to indirect observations under BIBO-stable linear filtering.
5. Computation, Scalability, and Statistical Properties
5.1 Inference and Algorithmic Pipeline
Training TGPs generally proceeds by maximizing marginal log-likelihood, which involves evaluating densities under the transported process and Jacobian penalties for invertible layers or flows. For deep or parametric flows, variational inference with inducing points and stochastic gradients is effective, retaining 7 scaling (2011.01596). For distribution-valued inputs, one precomputes Hilbertian embeddings (e.g., Sinkhorn potentials or Monge maps to a barycenter), accelerating kernel evaluations and allowing for batching and parallelization (Bachoc et al., 2022, Bachoc et al., 2018).
5.2 Statistical Consistency and Microergodicity
Radial kernels over Wasserstein/Hilbertian embeddings preserve microergodicity in infinite-dimensional settings, enabling identification of covariance hyperparameters with fixed-domain asymptotics (Bachoc et al., 2018). Empirical consistency under kernel estimation is established under mild convergence of barycenter and transport transforms (Bachoc et al., 2018). TGPs constructed via transport also inherit properties such as identifiability, modular interpretability, and physically plausible extrapolation, provided the transformations are invertible and regular.
5.3 Empirical Results and Application Domains
TGPs outperform or complement classical methods in settings requiring nonlinear marginals, robustness to heavy tails, boundedness, or complex dependence—exemplified by wind field estimation from satellite imagery, robust time series regression, and classification on distribution-valued inputs (Fahmy et al., 16 May 2025, Bachoc et al., 2022, Cohen et al., 2021). The barycentric aggregation of expert GPs reduces uncertainty miscalibration relative to precision-based product-of-experts, especially near regime boundaries (Cohen et al., 2021).
6. Extensions, Limitations, and Open Questions
Extensions of TGPs include:
- Multivariate and multitask generalizations (multiplexed transported processes, vector-valued barycenters).
- Integration with explicit physical constraints (e.g., divergence-free or PDE-constrained layers).
- Online and streaming variants for real-time data assimilation.
- Theoretical analysis of identifiability, sample complexity, and asymptotics under increasing data volume and layer depth.
Current limitations are dictated by invertibility and smoothness constraints for flows, computational cost of large matrix or operator computations, and the challenge of quantifying uncertainty propagation through deep transformations.
A plausible implication is that further advances in scalable computation of high-dimensional OT, advances in flow parameterization (e.g., from deep learning), and detailed study of identifiability in physically constrained settings will drive the next generation of transport-centric stochastic process models.
References:
- (Rios, 2020)
- (2011.01596)
- (Bachoc et al., 2018)
- (Bachoc et al., 2022)
- (Yun et al., 25 Dec 2025)
- (Gunasingam et al., 2024)
- (Quang, 2020)
- (Cohen et al., 2021)
- (Zorzi, 2020)
- (Fahmy et al., 16 May 2025)
- (Nguyen et al., 2021)