Multi-Fidelity GP Surrogate Modeling

Updated 26 November 2025

Multi-fidelity Gaussian process surrogate modeling is a framework that combines low-cost simulations with sparse high-fidelity data for efficient and accurate predictions.
It employs autoregressive co-kriging and advanced GP variants, including deep and nonlinear models, to quantify uncertainty and reduce computational costs.
Active learning and cost-aware sampling strategies further optimize high-fidelity data utilization while addressing challenges in high-dimensional, noisy, and non-nested settings.

Multi-fidelity Gaussian Process (GP) surrogate modeling is a framework for constructing probabilistic surrogate models that combine data from simulators or experiments of varying accuracy and cost. These surrogates leverage dense, inexpensive low-fidelity data together with sparse, expensive high-fidelity data, exploiting their statistical dependence to deliver accurate predictions and principled uncertainty quantification at reduced computational cost. The field is grounded in autoregressive co-kriging models and has advanced to include hierarchical, nonlinear, deep, and non-hierarchical architectures, with algorithmic innovations for high-dimensional, non-nested, and noisy datasets as well as cost-aware adaptive sampling.

1. Autoregressive Multi-Fidelity GP Surrogates: Formulation and Theory

The canonical formulation, established by Kennedy and O’Hagan and widely adopted in climate, engineering, and physics applications, posits an autoregressive relationship between a high-fidelity function $f_H(x)$ and a low-fidelity function $f_L(x)$ : $f_H(x) = \rho\,f_L(x) + \delta(x)$ where $f_L(x)\sim\mathcal{GP}(0, k_{LL}(x,x'))$ , $\delta(x)\sim\mathcal{GP}(0, k_{\delta\delta}(x,x'))$ , and $f_L, \delta$ are independent. The scalar $\rho$ is a scale parameter. Both kernels are typically chosen as squared-exponential or Matérn covariances with separate amplitude and length-scale hyperparameters.

For a set of $n_L$ low-fidelity ( $X_L$ ) and $n_H$ high-fidelity ( $X_H$ ) observations, the joint prior over observed outputs is multivariate Gaussian with 2×2 block-structured covariance: $K(x,x') = \begin{pmatrix} k_{LL}(x,x') & \rho\,k_{LL}(x,x') \ \rho\,k_{LL}(x,x') & \rho^2 k_{LL}(x,x') + k_{\delta\delta}(x,x') \end{pmatrix}$ Posterior prediction at a novel $x_\ast$ is given by: $\mathbb{E}[f_H(x_\ast)|Y] = k_\ast^\top K^{-1}Y$

$\operatorname{Var}[f_H(x_\ast)|Y] = \rho^2 k_{LL}(x_\ast, x_\ast) + k_{\delta\delta}(x_\ast, x_\ast) - k_\ast^\top K^{-1}k_\ast$

where $k_\ast$ concatenates cross-covariances with all training points. All parameters ( $\rho$ , kernel amplitudes, length-scales, noise) are learned by maximizing the marginal likelihood of the joint data vector $Y$ (Hudson et al., 2021).

This autoregressive model extends to $s>2$ fidelities via a recursive Markov hierarchy: $Z_{t}(x) = \rho_{t-1}(x) Z_{t-1}(x) + \delta_{t}(x)$ with $\delta_{t}(x)\sim\mathcal{GP}(\cdots)$ , independent of lower levels, and closed-form block covariance recursions for joint inference (Gratiet, 2011).

Nonlinear dependencies are handled by replacing the linear map with a nonlinear GP mapping, e.g., $f_H(x) = g(x, f_L(x))$ , $g$ a GP in the joint space, with composite kernels to model interaction structure (Ravi et al., 2024). Deep Gaussian processes (DGPs), with multi-layered GPs, further generalize to non-nested and highly nonlinear settings.

2. Model Training, Data Allocation, and Complexity

Training requires allocating limited high-fidelity samples judiciously due to the $O((n_L + n_H)^3)$ scaling of GP inference. Standard practice initializes with a modest number of random low- and high-fidelity points, then iteratively selects additional high-fidelity samples via variance-based active learning or acquisition heuristics.

Two batch acquisition methods have proven effective:

Batch Model Variance (MV_Σ): Selects regions with the highest posterior variance.
Integrated Variance Reduction (IVR): Selects batches that maximize expected reduction in global posterior variance.

IVR-based acquisition consistently reduces prediction error more rapidly for a fixed high-fidelity budget than naive or random sampling (Hudson et al., 2021). Application to regional climate prediction achieved an MSE of $15.62^\circ\mathrm{C}^2$ using only 6% of possible high-fidelity simulations, compared to $>94^\circ\mathrm{C}^2$ for single-fidelity GPs.

GP hyperparameter estimation employs gradient-based ML on the joint likelihood; in multi-fidelity contexts, block-algebra and sparse approximations are valuable for large-scale data (e.g., Nyström methods, inducing points) (Burnaev et al., 2017).

3. Extensions: Noisy, Non-Nested, and High-Dimensional Data

Recent work extends autoregressive multi-fidelity GPs for realities common in applied science:

Noisy, Non-Nested Designs: The recursive AR(1) GP can accommodate arbitrary (non-nested) training locations and explicit Gaussian observation noise. The high-fidelity process is modeled conditionally on the low-fidelity GP posterior, and EM algorithms are used for efficient parameter estimation, avoiding full joint marginal likelihood maximization (Baillie et al., 25 Nov 2025).
Dimension Reduction: In high-dimensional settings, supervised dimension reduction via rotation (e.g., SAVE) is used to find a low-dimensional subspace informative for the high-fidelity quantity. The Rotated Multi-Fidelity GP (RMFGP) iterates between subspace estimation and multi-fidelity surrogate construction, propagating uncertainty and enabling high-fidelity learning with minimal HF samples (Zhang et al., 2022).
Heterogeneous Input Spaces: When simulators operate over non-matching input parameters, affine or nonlinear input mappings align LF and HF spaces prior to co-kriging (Menon et al., 2024). Hyperparameters, including mapping parameters, are jointly optimized via marginal likelihood.

4. Active Learning, Acquisition Strategies, and Bayesian Optimization

Multi-fidelity frameworks naturally support cost-aware adaptive sampling. Strategies include:

Variance-based acquisition: Acquire samples at input-fidelity pairs with maximal posterior variance (Hudson et al., 2021).
Integrated variance reduction: Maximize the expected reduction in the HF prediction variance over a test set (Hudson et al., 2021).
Leave-one-out Cross-validation-driven Acquisition: The Multifidelity Cross-Validation (MFCV) approach builds a two-GP model, using an “outer” GP on the physical response and an “inner” GP on the log-LOO-CV residuals. New samples are selected by expected reduction in the maximum CV error at the highest fidelity, driving surrogates to minimize generalization error efficiently (Renganathan et al., 2024).
Cost-weighted Acquisition for Bayesian Optimization: Acquisition functions (e.g., expected improvement or UCB) are adapted to multi-fidelity surrogates, with the sampling decision at each iteration balancing improvement-per-cost (Chen et al., 2023, Manoj et al., 1 Aug 2025).
Latent Variable Approaches: Symmetric (non-hierarchical) latent embeddings enable joint modeling and adaptive acquisition across arbitrary collections of simulators (Chen et al., 2023).

5. Model Generalizations and Special Cases

Multi-output and Time Series: For vector-valued or time-series outputs, basis expansion (e.g., SVD or learned basis) is combined with autoregressive GP modeling on leading coefficients and tensor-product covariance on the residuals, yielding accurate prediction and UQ for dynamically structured outputs (Kerleguer, 2021).
Multiple Low-Fidelity Models: When several separate low-fidelity models exist, each is corrected individually via an additive GP discrepancy term. Local model probabilities, combining predictive mean and uncertainty, are used to fuse surrogates by deterministic selection, averaging, or stochastic selection (Chakroborty et al., 2022). Cost and accuracy are explicitly balanced, critical for rare event simulation.
Local Transfer and Latent Gating: Local transfer learning with ReLU-gated latent GPs (LOL-GP) addresses negative transfer by learning, per location, whether to borrow information from low-fidelity data or rely solely on the high-fidelity response. Gibbs sampling in latent space enables efficient posterior inference (Wang et al., 2024).
Hybrid Gaussian Process–Neural Network Models: To model highly nonlinear low-to-high-fidelity dependencies, Bayesian neural networks (BNNs) are used for the high-fidelity surrogate, with low-fidelity predictions (and uncertainty) included as inputs. GPBNN achieves improved predictive accuracy and uncertainty quantification when the low-fidelity response is highly imperfect (Kerleguer et al., 2023).

6. Performance Benchmarks and Practical Impact

Consistent findings across domains—climate science, engineering optimization, terramechanics, plasma physics, rare-event analysis—demonstrate that multi-fidelity GP surrogates drastically reduce the required number of high-fidelity simulations (typically by an order of magnitude or more), while achieving or exceeding the accuracy of single-fidelity models at fixed cost (Hudson et al., 2021, Chakroborty et al., 2022, Menon et al., 2024, Baillie et al., 25 Nov 2025).

Theoretical and empirical studies show that:
- Linear autoregressive (AR(1)) models suffice when the LF→HF map is an affine scaling + bias, while more complex relationships require nonlinear kernels or deep or hybrid surrogates (Ravi et al., 2024).
- Active learning via variance or IVR acquisition outperforms random or sequential design (Hudson et al., 2021).
- For high-dimensional or structured outputs, dimension reduction and tensor-product covariances enable accurate surrogates with minimal HF data (Zhang et al., 2022, Kerleguer, 2021).

7. Current Challenges, Limitations, and Future Directions

Active research areas include:

Scaling to Large Data: Nyström, inducing-point, and block-sparse methods reduce cubic scaling but must be carefully integrated with multi-fidelity structure (Burnaev et al., 2017).
Non-nested, Noisy, and Heterogeneous Data: New EM-based AR(1) methods address realistic settings with misaligned experimental designs, measurement noise, and parametric scaling (Baillie et al., 25 Nov 2025).
Flexible Non-hierarchical Modeling: Latent-variable GPs and local transfer learning allow for multi-model fusion and negative-transfer avoidance, broadening applicability beyond strictly hierarchical simulators (Chen et al., 2023, Wang et al., 2024).
Adaptive Fidelity Selection: Cost-weighted acquisition, proximity-based selection, and controlled UCB strategies enable user control over the high-fidelity fraction in Bayesian optimization (Manoj et al., 1 Aug 2025).
Hybrid Modeling: Integration of neural networks into the GP framework supports applications where the LF→HF map is highly nonlinear, with rigorous propagation of LF uncertainty (Kerleguer et al., 2023).

Limitations include sensitivity to kernel selection, initialization, and the need for representative overlap in input space between fidelity levels, although recent mapping strategies begin to relax these constraints (Menon et al., 2024). Parameter estimation for multi-level, high-dimensional, or non-nested surrogates remains numerically challenging, with EM and block-recursive procedures alleviating but not eliminating these barriers.

Multi-fidelity Gaussian process surrogate modeling thus provides a rigorously grounded, extensible methodology for efficient emulation, optimization, and reliability analysis across simulation-driven disciplines, continually evolving to meet the computational and data-integrity challenges of contemporary scientific modeling (Hudson et al., 2021, Renganathan et al., 2024, Ravi et al., 2024, Baillie et al., 25 Nov 2025, Wang et al., 2024).