Gaussian-Process Surrogate Modeling

Updated 21 December 2025

Gaussian-process surrogates are probabilistic, nonparametric metamodels that approximate complex, expensive-to-evaluate functions using a deterministic trend and a Gaussian process for residual modeling.
They leverage likelihood optimization for hyperparameter estimation, which enables extraction of feature importance and facilitates interpretable predictions.
They support practical tasks such as simulation, optimization, and calibration by providing robust uncertainty quantification and near-parity performance with black-box models.

A Gaussian-process surrogate is a probabilistic, nonparametric metamodel that approximates the response surface of a complex, expensive-to-evaluate or black-box function. It is constructed by leveraging the properties of Gaussian processes (GPs) to provide predictions, uncertainty quantification, and interpretability, thereby facilitating tasks such as simulation, optimization, calibration, and interpretation across scientific and engineering domains (Toutiaee et al., 2021).

1. Mathematical Structure of GP Surrogates

A GP surrogate models an unknown function $f:\mathbb{R}^k \to \mathbb{R}$ , or the output surface $Y(x)$ of a complex system, as a sum of a deterministic trend $\mu(x)$ and a zero-mean stationary GP $Z(x)$ : $Y(x) = \mu(x) + Z(x)$ The trend $\mu(x)=q(x)^\top\beta$ is a low-order polynomial or linear basis encoding known global behavior. The GP $Z(x)$ is specified via a covariance kernel: $\operatorname{Cov}[Z(x), Z(x')] = \sigma^2 \prod_{l=1}^k K\Bigl(|x^{(l)}-x'^{(l)}|;\theta_l\Bigr)$ A common choice is the anisotropic Gaussian (squared-exponential) kernel,

$k(x,x';\theta) = \sigma^2 \exp\Bigl(-\sum_{l=1}^k\theta_l(x^{(l)}-x'^{(l)})^2\Bigr)$

Other kernels (e.g., Matérn) are also used when different degrees of function smoothness are warranted (Toutiaee et al., 2021, Flovik et al., 3 Mar 2025, Jaber et al., 2024, Hornsby et al., 2024).

2. Hyperparameter Estimation and Likelihood Optimization

The kernel and trend are parameterized by hyperparameters $\theta = \{\theta_1,\dots,\theta_k\}$ (length-scales), $\beta$ , and $\sigma^2$ , which encode variable influence and smoothness. The full GP likelihood is

$\ell(\beta, \sigma^2, \theta) = -\frac{n}{2}\ln \sigma^2 - \frac{1}{2}\ln|\Psi(\theta)| - \frac{1}{2\sigma^2}(y - Q\beta)^\top \Psi(\theta)^{-1}(y - Q\beta)$

with $Q$ the design matrix of trend basis functions and $\Psi(\theta)$ the $n\times n$ kernel matrix. For fixed $\theta$ , the MLEs

$\hat\beta(\theta),\quad \hat\sigma^2(\theta)$

are computed by generalized least squares. Plugging these into $\ell$ yields a concentrated likelihood in $\theta$ , maximized by gradient-based (L-BFGS-B) or global (DE) algorithms (Toutiaee et al., 2021). The variable-importance vector $\hat\theta$ is directly interpretable: high $\hat\theta_l$ signals strong dependence of the response on variable $x^{(l)}$ .

3. Posterior Prediction and Uncertainty Quantification

With hyperparameters fixed, the GP predictive distribution at new $x^*$ is Gaussian: $Y(x^*) \mid y \sim \mathcal{N}(\hat\mu(x^*), s^2(x^*))$ where

$\hat\mu(x^*) = q(x^*)^\top \hat\beta + r(x^*)^\top \Psi^{-1}(y - Q\hat\beta)$

$s^2(x^*) = \hat\sigma^2\Bigl[1 - r^\top \Psi^{-1} r + u^\top (Q^\top \Psi^{-1} Q)^{-1} u\Bigr]$

with $r(x^*) = [K(x^*, x^i)]_{i=1}^n$ and $u = q(x^*) - Q^\top\Psi^{-1}r(x^*)$ (Toutiaee et al., 2021). All matrix inverses are limited to $n\times n$ or $m\times m$ .

4. Surrogate-Based Interpretability and Feature Importance

The estimated kernel parameters $\hat\theta$ (length-scale inverses) provide characterizations of feature importance: for feature $l$ ,

Small $\hat\theta_l$ : output varies slowly with $x^{(l)}$ (low importance)
Large $\hat\theta_l$ : rapid response variation (high importance)

This anisotropic kernel formalism enables both global (across the input domain) and local (per point or sample group) interpretability. The fitted correlation matrix $\Psi(\hat\theta)$ reveals clustering of sample responses: block-diagonal structure corresponds to groups treated similarly by the black-box predictor (Toutiaee et al., 2021).

5. Practical Applications and Empirical Results

In empirical studies, the GP surrogate—applied to black-box models including neural networks, gradient-boosted trees, and ensemble methods—achieves performance near parity with the true black-box ( $r\approx0.99$ for regression, RMSE within 1–2% of SOTA). In classical regression settings with main and interaction effects, the GP surrogate more robustly recovers true coefficients compared to logistic regression, and avoids known statistical pathologies (e.g., Hauck–Donner anomaly) (Toutiaee et al., 2021).

On benchmark datasets, sample-group interpretability via the correlation matrix $\Psi$ recapitulates functional groupings learned by the black-box. The surrogate supports both regression and probabilistic classification by modeling predicted probabilities as continuous responses.

6. Computational Workflow and Practical Considerations

The GP surrogate construction proceeds as follows:

Generate outputs $y$ by querying the complex model at a space-filling input design $X$ .
Specify a trend $\mu(x)$ and choose a kernel $k(x,x';\theta)$ .
Estimate hyperparameters by maximizing the restricted log-likelihood.
Use posterior prediction equations to interpolate at new points, extract feature importance from $\hat\theta$ , and interpret pairwise sample similarity via $\Psi(\hat\theta)$ .

Cholesky factorization is employed for matrix inversion, ensuring efficiency up to moderate $n$ (Toutiaee et al., 2021). Multi-dimensional input spaces are handled naturally by the kernel's tensor product/ARD structure.

7. Integrated Framework for Approximation and Interpretation

The GP surrogate approach, as unified in the G-FORSE methodology, simultaneously delivers:

Response surface interpolation with uncertainty quantification
Global feature-importance quantification via $\hat\theta$
Fine-grained sample-group interpretability via $\Psi$
Robustness and parsimony, as assessed quantitatively by predictive quality and empirical coverage

This statistical framework integrates and extends both metamodeling for emulation and machine-learning interpretation within a single, well-understood approach (Toutiaee et al., 2021).