Papers
Topics
Authors
Recent
Search
2000 character limit reached

Autoregressive Multi-Fidelity GP Surrogates

Updated 4 March 2026
  • Autoregressive multi-fidelity GP surrogates are defined as a model framework that couples low-fidelity computations with high-fidelity corrections using Gaussian processes.
  • They model high-fidelity outputs as a scaled low-fidelity prediction plus a discrepancy function, enabling accurate uncertainty quantification and cost reduction.
  • This approach is effective in engineering, climate modeling, and high-dimensional simulations, offering significant improvements in RMSE and runtime efficiency.

Autoregressive Multi-Fidelity GP Surrogates are a class of statistical surrogate models designed for settings where computational experiments or simulations are available at multiple levels of fidelity, with high-fidelity data being costly and low-fidelity data relatively inexpensive. These models employ autoregressive (AR) structures within a Gaussian process (GP) framework, leveraging relationships between fidelities to yield accurate predictive distributions while reducing the demand for high-fidelity samples. Autoregressive multi-fidelity GP surrogates are widely applicable in engineering, scientific computing, and the modeling of complex physical systems, as well as in Bayesian optimization and design of experiments.

1. Fundamental Model Architecture

The canonical form of the autoregressive multi-fidelity GP surrogate is an AR(1) hierarchical model, introduced by Kennedy and O’Hagan (2000). For two fidelities, the model is formalized as

fH(x)=ρfL(x)+δ(x)f_H(x) = \rho\,f_L(x) + \delta(x)

where:

  • fL(x)f_L(x): low-fidelity surrogate (typically computationally cheap, less accurate)
  • fH(x)f_H(x): high-fidelity surrogate (computationally expensive, highly accurate)
  • ρR\rho \in \mathbb{R}: scaling parameter
  • δ(x)\delta(x): independent discrepancy function, modeled as a zero-mean GP and uncorrelated with fL(x)f_L(x).

Both fLf_L and δ\delta are independently assigned GP priors, customarily with regression mean functions and stationary kernels, such as squared-exponential (SE) or Matérn classes. Observed outputs include additive Gaussian noise: yL(x)=fL(x)+ϵL, ϵLN(0,τL2)yH(x)=fH(x)+ϵH, ϵHN(0,τH2)y_L(x) = f_L(x) + \epsilon_L, \ \epsilon_L \sim \mathcal N(0,\tau_L^2) \qquad y_H(x) = f_H(x) + \epsilon_H, \ \epsilon_H \sim \mathcal N(0,\tau_H^2) This architecture can be recursively extended to L>2L > 2 fidelities, via f(x)=ρf1(x)+δ(x)f_\ell(x) = \rho_\ell\,f_{\ell-1}(x) + \delta_\ell(x) for =2,,L\ell = 2,\dots,L, with each δ\delta_\ell an independent GP and either scalar or functional (input-dependent) scale parameters ρ\rho_\ell (Ravi et al., 2024).

2. Covariance Structure and Posterior Inference

The joint GP prior over (fL,fH)(f_L, f_H) induces a block-structured covariance matrix,

Cov(fL fH)=(KLLKLH KHLKHH)\mathrm{Cov} \begin{pmatrix} f_L \ f_H \end{pmatrix} = \begin{pmatrix} K_{LL} & K_{LH} \ K_{HL} & K_{HH} \end{pmatrix}

with \begin{align*} K_{LL} &= K_L(X_L, X_L) + \tau_L2 I \ K_{LH} &= \rho K_L(X_L, X_H) \ K_{HH} &= \rho2 K_L(X_H, X_H) + K_\delta(X_H, X_H) + \tau_H2 I, \end{align*} where KLK_L and KδK_\delta are the covariance matrices for the respective kernels (Do et al., 2023, Hudson et al., 2021).

For a new input xx^*, the posterior predictive mean for fHf_H is given in closed form by Gaussian conditioning: μH(x)=μH(x)+kK1(yμ)\mu_H(x^*) = \mu_H(x^*) + k_*^\top K^{-1} (y - \mu) where kk_* is the cross-covariance vector between the training data and fH(x)f_H(x^*), and KK is the joint covariance matrix over all observations.

This full-GP construction generalizes to more than two fidelities, with the covariance between any pair f(x)f_\ell(x), fm(x)f_m(x') given by recursively derived relations, accommodating block structures up to LL levels (Ravi et al., 2024).

3. Hyperparameter Estimation, Training, and Scalability

All model parameters—including covariance hyperparameters for each kernel (length-scales and variances), noise variances, and the AR scaling parameters ρ\rho—are estimated by maximizing the marginal likelihood: (θ)=12(yμ)K1(yμ)+12logK+N2log(2π)\ell(\theta) = \frac{1}{2}(y-\mu)^\top K^{-1}(y-\mu) + \frac{1}{2}\log|K| + \frac{N}{2} \log(2\pi) where yy stacks all observed outputs, μ\mu is the prior mean at the observed inputs, and KK is the full covariance (Do et al., 2023, Hudson et al., 2021).

For models with non-nested designs or non-constant scaling, expectation-maximization (EM) algorithms can be used, which decouple the estimation of low- and high-fidelity parameters, decreasing computational costs from O((NL+NH)3)O((N_L+N_H)^3) to O(NL3+NH3)O(N_L^3 + N_H^3). This is especially beneficial when NHNLN_H \ll N_L, as in many real applications (Baillie et al., 25 Nov 2025). Gradient-based optimizers (e.g., L-BFGS) leverage analytic or automatic differentiation of the likelihood with respect to all parameters.

Numerical stability is maintained by adding small "jitter" values (e.g., ε106\varepsilon \approx 10^{-6}) to matrix diagonals before Cholesky factorization.

4. Model Extensions and Generalizations

Recent developments have broadened the AR-GP paradigm to:

  • Multi-level (>2) fidelities: Recursive autoregressive models accommodate LL fidelities (Ravi et al., 2024, Calle-Saldarriaga et al., 26 Sep 2025).
  • Structured and delay kernels: For physical systems where high-fidelity responses depend on local derivatives of lower fidelities, kernels incorporating delayed or shifted inputs enable "physics-informed" priors (Ravi et al., 2024).
  • Tensor-valued and high-dimensional outputs: The Generalized Autoregression (GAR) and CIGAR models use tensor-variate GPs and Tucker transforms, supporting arbitrary-dimensional, unaligned outputs. Notably, the autokrigeability property ensures the closed-form predictive mean remains exact and scalable to high output dimensions (Wang et al., 2023).
  • Non-Gaussian and transport map constructions: For spatial fields and nonstationary data, conditional triangular transport maps and local GP regularization enable modeling of non-Gaussian joint distributions and nonlinear cross-fidelity relationships. Stochastic mini-batch optimization leverages closed-form marginal likelihoods induced by conjugacy with inverse-gamma priors (Calle-Saldarriaga et al., 26 Sep 2025).
  • Non-nested, noisy, or unaligned designs: Decoupled and recursive EM-based approaches efficiently estimate models given arbitrary sampling locations and heteroscedastic noise (Baillie et al., 25 Nov 2025).

5. Kernel Choices and Mismatch Modeling

Autoregressive multi-fidelity GP surrogates typically employ distinct kernel functions for each component:

  • Low-fidelity (fLf_L): Often assigned a squared-exponential (SE) kernel when the underlying process is believed to be smooth.
  • Discrepancy (δ\delta): Can use SE, Matérn, or rational quadratic kernels. Choice depends on the smoothness or locality expected in the mismatch.
  • Composite and structured kernels: Address cases where cross-fidelity mapping is nonlinear, nonstationary, or involves local phase or derivative shifts (Do et al., 2023, Ravi et al., 2024).

Model expressiveness increases with kernels incorporating multiple length-scales or additive structures, enabling each fidelity level to capture global and local effects.

6. Practical Implementation and Computational Considerations

Key implementation guidelines include:

  • Pre-standardization of inputs to mean zero and unit variance.
  • Initialization of kernel hyperparameters to approximately one-quarter the input domain's range.
  • Constraints to enforce strictly positive noise variances.
  • Warm-starting the AR-scale parameter ρ\rho using cov(yH,yL)/var(yL)\mathrm{cov}(y_H,y_L) / \mathrm{var}(y_L).
  • Gradient-based or EM-based hyperparameter optimization, exploiting block-wise linear algebra for efficiency.
  • For high-dimensional outputs or tensor-formulated problems, the CIGAR simplification reduces computational cost from O(d3)O(d^3) to O(n3)O(n^3), where dd is output dimension and nn is sample size, by imposing conditional independence and orthogonality constraints (Wang et al., 2023).

Computational bottlenecks arise from the O(N3)O(N^3) cost of inverting the full covariance matrix; sparse, local, or inducing-point GP approximations are effective when the training set is large or NLNHN_L \gg N_H (Do et al., 2023, Hudson et al., 2021).

7. Application Domains and Empirical Performance

Autoregressive multi-fidelity GP surrogates have been validated across diverse scientific and engineering settings:

  • Engineering design optimization: Dramatic reductions in the number of expensive high-fidelity calls with minimal sacrifice in accuracy, particularly in physics-constrained domains (Do et al., 2023).
  • Climate modeling: High-resolution regional predictions achieved by fusing sparse regional climate model (RCM) data with dense but coarse global climate model (GCM) outputs, attaining mean-squared errors of 15.62C215.62\,\mathrm{C}^2 using only 6%6\% of the RCM calls required by single-fidelity approaches (Hudson et al., 2021).
  • Physical and real-world systems: Terramechanics, plasma microturbulence simulations, and PDE-based modeling, where AR(1), nonlinear AR, and delay GP variants outperform single-fidelity models by reducing root mean square error (RMSE) and computational runtime by factors of $2-10$ and $10-100$, respectively (Ravi et al., 2024).
  • High-dimensional problems: GAR and CIGAR exhibit strong empirical performance, yielding up to a 6×6\times RMSE reduction on canonical PDE, topology optimization, and scientific applications with only a few high-fidelity samples (Wang et al., 2023, Calle-Saldarriaga et al., 26 Sep 2025).

Limitations stem from AR(1)'s inherent assumption of a linear mapping between fidelities, inapplicability to data with strong nonlinear or nonstationary cross-fidelity structure unless suitably extended, and cubic scaling constraints on very large datasets.


References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Autoregressive Multi-Fidelity GP Surrogates.