Transformed Additive GP Surrogates

Updated 6 August 2025

Transformed additive GP surrogates are defined by integrating GP regression with structural transformations to capture additive and projected effects in high-dimensional data.
They decompose functions into component kernels acting on transformed subspaces, thereby reducing computational cost while enhancing interpretability.
These models are widely applied in high-fidelity prediction, robust optimization, and transfer learning, utilizing advanced numerical algorithms for parameter estimation.

A Transformed Additive Gaussian Process Surrogate Model integrates the mathematical flexibility of Gaussian Process (GP) regression with structural transformations—such as additivity, projections, or affine mappings—in the surrogate construction. This framework enhances efficiency, interpretability, and transferability when approximating expensive computational models or complex multiscale phenomena. Transformed additive GP models leverage structured kernels or composite architectures to represent the response as a sum of components—each possibly operating on transformed or embedded versions of the original input—thus capturing both the marginal effects and subset-specific or domain-adapted features. Such surrogates are particularly valuable for high-dimensional prediction, variable fidelity modeling, interpretable machine learning, robust optimization under uncertainty, and transfer learning contexts.

1. Additive Structure and Kernel Construction

The core idea in additive GP surrogate modeling is to specify the covariance (kernel) as a sum of component kernels, each acting on a subset (often individual dimensions) of the input space. In a $d$ -dimensional setting, the canonical additive kernel is

$K(x, y) = \sum_{i=1}^{d} K_i(x_i, y_i)$

which leads to the GP mean function and posterior prediction exhibiting an additive structure: $f(x) = \mu + \sum_{i=1}^{d} f_i(x_i)$ This form restricts the modeled function to the function class of generalized additive models (GAMs), conferring two significant benefits: (1) parameter count and computational cost are reduced compared to a full tensor-product kernel, and (2) interpretability is greatly improved, since the effect of each variable is isolated (Durrande et al., 2011).

In settings where the modeled function possesses low-dimensional structure or where interactions between variables play a critical role, the additive kernel can be extended to include higher-order interactions or transformed input combinations, as in multi-index or projected additive models (Li et al., 2023). A transformed additive GP then takes the form: $f(x) = \sum_{l=1}^{L} g_l(M_l x)$ where each $M_l$ is a $p \times d$ projection matrix defining a potentially overlapping subset or direction in the input, and $g_l$ is a one- or low-dimensional function equipped with a GP prior.

2. Transformation Mechanisms: Projections, Affine Maps, and Multi-fidelity Decomposition

Beyond coordinatewise additivity, practical surrogate modeling often requires transformations to adapt the representation to domain shifts, inherent low-dimensional manifolds, or complex dependency structures:

Linear/affine input transformations: Transfer learning frameworks adapt the surrogate by learning affine transformations $x' = R x + v$ (with $R$ a rotation and $v$ a translation) that align the domain of a pre-trained source model to a new target domain, followed by additive correction with a residual GP (Pan et al., 23 Jan 2025). Optimization is performed over the transformation parameters either via Riemannian gradient descent (for differentiable GP surrogates) or derivative-free methods (for non-differentiable surrogates), minimizing an empirical error on transfer validation data.
Low-dimensional embeddings: Additive multi-index GP models parameterize the active subspace structure by learning projection matrices $M_l$ so that $g_l(M_l x)$ captures dimension-reduced but physically meaningful contributions to the output, reflecting multi-physics modularity (Li et al., 2023). Variational inference and inducing points are used to efficiently learn both the projections and the GP parameters.
Multi-fidelity co-kriging decomposition: In surrogate modeling for variable fidelity data, the high-fidelity signal is typically modeled as an additive transformation of low-fidelity surrogate predictions with an added GP-modeled discrepancy: $y_h(x) = \rho\, y_l(x) + y_d(x)$ where $\rho$ is a scale parameter mapping low- to high-fidelity domains (Burnaev et al., 2017, Kerleguer, 2021). This structure directly encodes the transformed additive nature within a unified GP framework and allows for scalable approximations via the Nyström method or blackbox methods.

3. Parameter Estimation and Numerical Algorithms

Efficient parameter estimation is critical due to nonconvexity and the high-dimensional parameter spaces in GP kernel learning. For additive kernels, the Relaxed Likelihood Maximization (RLM) algorithm introduces a cyclic blockwise optimization routine in which parameters for each additive component are optimized separately, while a relaxation (observation noise parameter $\tau^2$ ) is used to make intermediate likelihood surfaces more tractable:

Initialize kernel parameters for each dimension.
Sequentially optimize each $\psi_i$ holding others fixed, including $\tau$ to account for nonadditivity.
Cycle over all dimensions until convergence ( $\tau^2 \rightarrow 0$ for strictly additive ground truth) (Durrande et al., 2011).

For transformation-based models, the optimization involves simultaneous updates of transformation (e.g., $R$ , $v$ ), kernel, and variational parameters, often using stochastic gradient methods or Riemannian geometry-based solvers to maintain projection matrix constraints (e.g., orthogonality, subspace structure) (Li et al., 2023, Archbold et al., 23 Apr 2024, Lu et al., 2022).

4. Theoretical Properties, Uncertainty Quantification, and Scaling

Transformed additive GP surrogates retain the closed-form Bayesian predictive mean and variance formulas inherent to GP regression, enabling principled uncertainty quantification. In variable fidelity and multi-output contexts, the posterior variance reflects both aleatoric uncertainty from the original data and epistemic uncertainty from additive decomposition, domain transformations, or missing data imputation (Durrande et al., 2011, Chan et al., 2023).

The additive decomposition also admits a connection to functional ANOVA and Sobol sensitivity analysis, particularly for orthogonally constrained additive kernels (OAKs): each component is uniquely identifiable under the input distribution,

$\int f_u(x_u)\, \prod_{i \in u} p(x_i) dx_u = 0$

where $u$ denotes a subset of input indices (Lu et al., 2022). This facilitates interpretable variance attribution and improved hyperparameter learning.

Scalability is achieved through structural kernel design (low intrinsic rank), variational/inducing point approximations, random Fourier features (Zhang et al., 19 Feb 2024), and composite architectures decomposing time-series, high-dimensional fields, or multi-fidelity outputs (Kerleguer, 2021, Deshpande et al., 15 Jul 2024).

5. Applications in Prediction, Optimization, and Transfer Learning

Transformed additive GP surrogates are crucial in:

Prediction under high input dimension: Additive and transformed kernels mitigate the curse of dimensionality by modeling covariances via sums over one- or low-dimensional subspaces, enabling effective surrogate construction with limited data (Durrande et al., 2011, Li et al., 2023).
Design optimization and robust inverse modeling: Efficient modeling of robust objective functions (mean/variance cost functions) and high-dimensional uncertainties is possible by integrating projection-based transforms with variational Bayesian inference for robust surrogates (Archbold et al., 23 Apr 2024).
Uncertainty quantification and adaptive experimental design: Closed-form estimation of variance, confidence intervals, and coverage guarantees (e.g., via conformal methods (Jaber et al., 15 Jan 2024)) are naturally enabled. Adaptive designs maximize Bayesian improvement (e.g., expected improvement), and update sampling locations or simulation accuracy to maximize surrogate fidelity per computational effort (Takhtaganov et al., 2018, Semler et al., 2023, Villani et al., 30 Apr 2024).
Transfer learning across domains or tasks: Affine and more general transformations are exploited so that surrogates trained on a source domain (potentially with abundant legacy data) can be transferred efficiently to a target domain with scarce data, with the transformation either fixed or learned to minimize empirical target error (Pan et al., 23 Jan 2025).

6. Interpretability, Limitations, and Future Directions

The additive and transformed paradigm substantially enhances interpretability: each kernel component or latent subspace contribution can be visualized and attributed to specific physical, design, or domain characteristics (Lu et al., 2022, Zhang et al., 19 Feb 2024). Additive decompositions admit clear ranking and visualization of main effects and low-order interactions, facilitating communication with domain experts and integration into explainable machine learning workflows.

Limitations include:

When the true system function has strong nonadditive couplings or complex nonstationary structure not captured by the additive form, the purely additive or low-dimensional transform may be insufficient (indicated by nonzero auxiliary noise parameter $\tau^2$ in the RLM (Durrande et al., 2011)).
Estimation routines tailored to additive kernels may not directly extend to full tensor-product or highly nonlinear kernels without substantial modification.
For transfer learning, affine or low-rank projections may not capture intricate nonlinear domain differences.

Future research directions involve hierarchical additive decompositions, deep kernel learning embeddings for automated structure discovery, integration with neural surrogates and autoencoder-based latent spaces for high-dimensional output fields (Deshpande et al., 15 Jul 2024), and rigorously combining uncertainties from both model and transformation/transfer estimation.

7. Representative Models and Comparative Performance

Recent methods implementing transformed additive GP surrogate models include:

Model Type	Transformation Structure	Applications / Results
Additive GP (RLM) (Durrande et al., 2011)	Sum of 1D kernels	High Q₂ on Sobol’s g-function; robust in high-dimensions; interpretable
Multi-index Additive GP (AdMIn-GP) (Li et al., 2023)	Projections + Additive	Surrogate modeling for multi-physics (QGP), improves coverage and RMSE
Variable Fidelity GP (VFGP) (Burnaev et al., 2017, Kerleguer, 2021)	Scaling/shift + Discrepancy GP	Combines high/low-fidelity data; accuracy improves when high-fidelity data is rare
Affine Transformation Transfer GP (Pan et al., 23 Jan 2025)	Rotation + Translation of $x$	Outperforms retraining from sparse data; efficient in real-world benchmarks
Orthogonal Additive Kernel (OAK) (Lu et al., 2022)	Orthogonally-constrained sum	Identifiable FANOVA decomposition; efficient in sparse GP approximations

This spectrum of approaches demonstrates the versatility and scalability of transformed additive GP surrogates, with models tailored to the demands of high-dimensional, multi-domain, or variable fidelity modeling.

The transformed additive Gaussian Process Surrogate Model framework unifies structural transformations, additive decomposition, and Gaussian process inference, resulting in interpretable, sample-efficient, and transferable surrogate models with principled uncertainty quantification. It is well-suited for complex design, simulation, and multi-fidelity settings arising across science and engineering.