Autoregressive Multi-Fidelity GP Surrogates

Updated 4 March 2026

Autoregressive multi-fidelity GP surrogates are defined as a model framework that couples low-fidelity computations with high-fidelity corrections using Gaussian processes.
They model high-fidelity outputs as a scaled low-fidelity prediction plus a discrepancy function, enabling accurate uncertainty quantification and cost reduction.
This approach is effective in engineering, climate modeling, and high-dimensional simulations, offering significant improvements in RMSE and runtime efficiency.

Autoregressive Multi-Fidelity GP Surrogates are a class of statistical surrogate models designed for settings where computational experiments or simulations are available at multiple levels of fidelity, with high-fidelity data being costly and low-fidelity data relatively inexpensive. These models employ autoregressive (AR) structures within a Gaussian process (GP) framework, leveraging relationships between fidelities to yield accurate predictive distributions while reducing the demand for high-fidelity samples. Autoregressive multi-fidelity GP surrogates are widely applicable in engineering, scientific computing, and the modeling of complex physical systems, as well as in Bayesian optimization and design of experiments.

1. Fundamental Model Architecture

The canonical form of the autoregressive multi-fidelity GP surrogate is an AR(1) hierarchical model, introduced by Kennedy and O’Hagan (2000). For two fidelities, the model is formalized as

$f_H(x) = \rho\,f_L(x) + \delta(x)$

where:

$f_L(x)$ : low-fidelity surrogate (typically computationally cheap, less accurate)
$f_H(x)$ : high-fidelity surrogate (computationally expensive, highly accurate)
$\rho \in \mathbb{R}$ : scaling parameter
$\delta(x)$ : independent discrepancy function, modeled as a zero-mean GP and uncorrelated with $f_L(x)$ .

Both $f_L$ and $\delta$ are independently assigned GP priors, customarily with regression mean functions and stationary kernels, such as squared-exponential (SE) or Matérn classes. Observed outputs include additive Gaussian noise: $y_L(x) = f_L(x) + \epsilon_L, \ \epsilon_L \sim \mathcal N(0,\tau_L^2) \qquad y_H(x) = f_H(x) + \epsilon_H, \ \epsilon_H \sim \mathcal N(0,\tau_H^2)$ This architecture can be recursively extended to $L > 2$ fidelities, via $f_\ell(x) = \rho_\ell\,f_{\ell-1}(x) + \delta_\ell(x)$ for $\ell = 2,\dots,L$ , with each $\delta_\ell$ an independent GP and either scalar or functional (input-dependent) scale parameters $\rho_\ell$ (Ravi et al., 2024).

2. Covariance Structure and Posterior Inference

The joint GP prior over $(f_L, f_H)$ induces a block-structured covariance matrix,

$\mathrm{Cov} \begin{pmatrix} f_L \ f_H \end{pmatrix} = \begin{pmatrix} K_{LL} & K_{LH} \ K_{HL} & K_{HH} \end{pmatrix}$

with \begin{align*} K_{LL} &= K_L(X_L, X_L) + \tau_L² I \ K_{LH} &= \rho K_L(X_L, X_H) \ K_{HH} &= \rho² K_L(X_H, X_H) + K_\delta(X_H, X_H) + \tau_H² I, \end{align*} where $K_L$ and $K_\delta$ are the covariance matrices for the respective kernels (Do et al., 2023, Hudson et al., 2021).

For a new input $x^*$ , the posterior predictive mean for $f_H$ is given in closed form by Gaussian conditioning: $\mu_H(x^*) = \mu_H(x^*) + k_*^\top K^{-1} (y - \mu)$ where $k_*$ is the cross-covariance vector between the training data and $f_H(x^*)$ , and $K$ is the joint covariance matrix over all observations.

This full-GP construction generalizes to more than two fidelities, with the covariance between any pair $f_\ell(x)$ , $f_m(x')$ given by recursively derived relations, accommodating block structures up to $L$ levels (Ravi et al., 2024).

3. Hyperparameter Estimation, Training, and Scalability

All model parameters—including covariance hyperparameters for each kernel (length-scales and variances), noise variances, and the AR scaling parameters $\rho$ —are estimated by maximizing the marginal likelihood: $\ell(\theta) = \frac{1}{2}(y-\mu)^\top K^{-1}(y-\mu) + \frac{1}{2}\log|K| + \frac{N}{2} \log(2\pi)$ where $y$ stacks all observed outputs, $\mu$ is the prior mean at the observed inputs, and $K$ is the full covariance (Do et al., 2023, Hudson et al., 2021).

For models with non-nested designs or non-constant scaling, expectation-maximization (EM) algorithms can be used, which decouple the estimation of low- and high-fidelity parameters, decreasing computational costs from $O((N_L+N_H)^3)$ to $O(N_L^3 + N_H^3)$ . This is especially beneficial when $N_H \ll N_L$ , as in many real applications (Baillie et al., 25 Nov 2025). Gradient-based optimizers (e.g., L-BFGS) leverage analytic or automatic differentiation of the likelihood with respect to all parameters.

Numerical stability is maintained by adding small "jitter" values (e.g., $\varepsilon \approx 10^{-6}$ ) to matrix diagonals before Cholesky factorization.

4. Model Extensions and Generalizations

Recent developments have broadened the AR-GP paradigm to:

Multi-level (>2) fidelities: Recursive autoregressive models accommodate $L$ fidelities (Ravi et al., 2024, Calle-Saldarriaga et al., 26 Sep 2025).
Structured and delay kernels: For physical systems where high-fidelity responses depend on local derivatives of lower fidelities, kernels incorporating delayed or shifted inputs enable "physics-informed" priors (Ravi et al., 2024).
Tensor-valued and high-dimensional outputs: The Generalized Autoregression (GAR) and CIGAR models use tensor-variate GPs and Tucker transforms, supporting arbitrary-dimensional, unaligned outputs. Notably, the autokrigeability property ensures the closed-form predictive mean remains exact and scalable to high output dimensions (Wang et al., 2023).
Non-Gaussian and transport map constructions: For spatial fields and nonstationary data, conditional triangular transport maps and local GP regularization enable modeling of non-Gaussian joint distributions and nonlinear cross-fidelity relationships. Stochastic mini-batch optimization leverages closed-form marginal likelihoods induced by conjugacy with inverse-gamma priors (Calle-Saldarriaga et al., 26 Sep 2025).
Non-nested, noisy, or unaligned designs: Decoupled and recursive EM-based approaches efficiently estimate models given arbitrary sampling locations and heteroscedastic noise (Baillie et al., 25 Nov 2025).

5. Kernel Choices and Mismatch Modeling

Autoregressive multi-fidelity GP surrogates typically employ distinct kernel functions for each component:

Low-fidelity ( $f_L$ ): Often assigned a squared-exponential (SE) kernel when the underlying process is believed to be smooth.
Discrepancy ( $\delta$ ): Can use SE, Matérn, or rational quadratic kernels. Choice depends on the smoothness or locality expected in the mismatch.
Composite and structured kernels: Address cases where cross-fidelity mapping is nonlinear, nonstationary, or involves local phase or derivative shifts (Do et al., 2023, Ravi et al., 2024).

Model expressiveness increases with kernels incorporating multiple length-scales or additive structures, enabling each fidelity level to capture global and local effects.

6. Practical Implementation and Computational Considerations

Key implementation guidelines include:

Pre-standardization of inputs to mean zero and unit variance.
Initialization of kernel hyperparameters to approximately one-quarter the input domain's range.
Constraints to enforce strictly positive noise variances.
Warm-starting the AR-scale parameter $\rho$ using $\mathrm{cov}(y_H,y_L) / \mathrm{var}(y_L)$ .
Gradient-based or EM-based hyperparameter optimization, exploiting block-wise linear algebra for efficiency.
For high-dimensional outputs or tensor-formulated problems, the CIGAR simplification reduces computational cost from $O(d^3)$ to $O(n^3)$ , where $d$ is output dimension and $n$ is sample size, by imposing conditional independence and orthogonality constraints (Wang et al., 2023).

Computational bottlenecks arise from the $O(N^3)$ cost of inverting the full covariance matrix; sparse, local, or inducing-point GP approximations are effective when the training set is large or $N_L \gg N_H$ (Do et al., 2023, Hudson et al., 2021).

7. Application Domains and Empirical Performance

Autoregressive multi-fidelity GP surrogates have been validated across diverse scientific and engineering settings:

Engineering design optimization: Dramatic reductions in the number of expensive high-fidelity calls with minimal sacrifice in accuracy, particularly in physics-constrained domains (Do et al., 2023).
Climate modeling: High-resolution regional predictions achieved by fusing sparse regional climate model (RCM) data with dense but coarse global climate model (GCM) outputs, attaining mean-squared errors of $15.62\,\mathrm{C}^2$ using only $6\%$ of the RCM calls required by single-fidelity approaches (Hudson et al., 2021).
Physical and real-world systems: Terramechanics, plasma microturbulence simulations, and PDE-based modeling, where AR(1), nonlinear AR, and delay GP variants outperform single-fidelity models by reducing root mean square error (RMSE) and computational runtime by factors of $2-10$ and $10-100$, respectively (Ravi et al., 2024).
High-dimensional problems: GAR and CIGAR exhibit strong empirical performance, yielding up to a $6\times$ RMSE reduction on canonical PDE, topology optimization, and scientific applications with only a few high-fidelity samples (Wang et al., 2023, Calle-Saldarriaga et al., 26 Sep 2025).

Limitations stem from AR(1)'s inherent assumption of a linear mapping between fidelities, inapplicability to data with strong nonlinear or nonstationary cross-fidelity structure unless suitably extended, and cubic scaling constraints on very large datasets.

References

(Do et al., 2023) Multi-fidelity Bayesian Optimization in Engineering Design
(Hudson et al., 2021) Computationally-Efficient Climate Predictions using Multi-Fidelity Surrogate Modelling
(Calle-Saldarriaga et al., 26 Sep 2025) Generative multi-fidelity modeling and downscaling via spatial autoregressive transport maps
(Ravi et al., 2024) Multi-fidelity Gaussian process surrogate modeling for regression problems in physics
(Baillie et al., 25 Nov 2025) Efficient multi-fidelity Gaussian process regression for noisy outputs and non-nested experimental designs
(Wang et al., 2023) GAR: Generalized Autoregression for Multi-Fidelity Fusion