Multi-Fidelity Surrogates: Techniques & Applications

Updated 11 March 2026

Multi-fidelity surrogates are models that fuse abundant low-fidelity and sparse high-fidelity data to achieve accurate predictions under computational constraints.
They employ diverse methods including autoregressive models, Gaussian processes, and neural networks to integrate hierarchical data from various informational sources.
Applications in simulation-based optimization and uncertainty quantification show substantial error reduction and efficiency gains, despite challenges in data alignment and hyperparameter tuning.

A multi-fidelity surrogate is a statistical or machine learning model that fuses datasets or simulation evaluations from several sources (“information sources” or “fidelity levels”) of varying cost and accuracy, with the goal of optimizing predictive accuracy within a constrained computational budget. The central paradigm is to leverage abundant but coarse low-fidelity (LF) data, correcting it with sparser but accurate high-fidelity (HF) data. This approach has become foundational in computational science and engineering for tasks such as simulation-based optimization, uncertainty quantification, rare-event estimation, and design of experiments. Theoretical, algorithmic, and empirical developments over the past decade have yielded a taxonomy of surrogate models capable of flexibly exploiting hierarchical, non-hierarchical, or heterogeneous fidelity arrangements.

1. Mathematical Frameworks for Multi-Fidelity Surrogates

Canonical multi-fidelity surrogates are built upon functional relationships between outputs at different fidelities. The simplest is the linear autoregressive model (“AR1” or recursive model), which expresses the HF quantity as

$f_{HF}(x) = \rho\,f_{LF}(x) + \delta(x)$

where $\rho$ is a scaling parameter and $\delta(x)$ is a (possibly nonlinear) discrepancy function (Ravi et al., 2024). This paradigm generalizes naturally to $K>2$ fidelities: $f^{(k)}(x) = \rho_{k-1} f^{(k-1)}(x) + \delta_k(x), \quad k=1,\dots,K$ or, with nonlinear corrections, $f^{(k)}(x) = g_k(x, f^{(k-1)}(x)) + \delta_k(x)$ . Discrepancy functions may be modeled as polynomials, Gaussian processes (GPs), neural networks, or other surrogates, and scale factors may be constant or input-dependent functions. Progressive, chain, or directed acyclic graph (DAG) topologies can be used to encode arbitrary conditional dependencies among fidelities (Conti et al., 15 Oct 2025, Gorodetsky et al., 2020).

For domains with functional or field outputs (e.g., time series or spatial PDE solutions), outputs are often projected onto low-dimensional linear (PCA/POD) or nonlinear manifolds, and multi-fidelity surrogates are formulated on the manifold coordinates (Brunel et al., 2024, Kerleguer, 2021). With heterogeneous input spaces (e.g., where LF and HF solvers differ in parameterization), explicit maps are learned between input domains prior to output fusion (Menon et al., 2024). For uncertainty-aware surrogates, aleatoric and epistemic uncertainty are jointly modeled via Bayesian or bootstrap ensemble constructs, providing rigorous predictive intervals (Giannoukou et al., 2024, Kerleguer et al., 2023).

2. Algorithmic and Architectural Variants

2.1 Gaussian Process and Kernel Methods

The AR1 co-kriging model—originating with Kennedy & O’Hagan (2001)—remains the standard in GP-based surrogates, where the cross-correlation between fidelities yields joint GP priors with block-structured kernels. For $L$ levels: $\begin{cases} Y_1(x) \sim \mathcal{GP}(0, k_1(x,x'))\ Y_k(x) = \rho_{k-1} Y_{k-1}(x) + \delta_k(x),\ \delta_k \sim \mathcal{GP}(0, k_{\delta_k})(x,x') \end{cases}$ and predictions/marginals can be computed block-wise (Ravi et al., 2024). Nonlinear AR (NARGP) and delay-augmented kernels enable flexible modeling of nonlinear, phase-shifted, or derivative relationships (Ravi et al., 2024). Recent models incorporate spatially-varying trust weights for heteroscedastic fusion, e.g., the “MAST” approach, which spatially blends corrected LF and HF predictions via distance-based weights and explicitly propagates variance (Nasr et al., 24 Feb 2026).

Support Vector Regression (Co_SVR) adopts an SVR model for the discrepancy, using tuned block kernels to draw on both LF and HF sample structures (Shi et al., 2019). Fully linear least-squares surrogates (LS-MFS) can also be constructed by embedding LF predictions as regression bases alongside polynomial discrepancies (Zhang et al., 2017).

2.2 Deep Learning Approaches

Multi-fidelity surrogates based on neural architectures provide scalability for high-dimensional and functional data. Progressive residual correction NNs use a stack of encoder-decoder pairs for each fidelity level, where encoders handle arbitrary input modalities and decoders perform regression to cumulative residuals; each level is trained sequentially with weights frozen below to prevent knowledge degradation (Conti et al., 15 Oct 2025). Neural process models with decoder-level aggregation offer explicit residual learning, coupling inference between decoded low- and high-fidelity levels for accurate transfer to out-of-distribution regions (Niu et al., 2024). LSTM-based surrogates, both in hierarchical and parallel layout, efficiently model time-dependent and parametric systems, with correction or fusion occurring at the level of sequence latent states or outputs (Conti et al., 2022, Conti et al., 2023). CNNs with transfer learning (multi-fidelity up-projection) exploit parameter-efficient adaptation for field outputs, guided by optimal sample allocation under multilevel Monte Carlo theory (Song et al., 2021).

For uncertainty-aware surrogates, Bayesian neural networks (GPBNN) combine a GP emulator for the LF level with a BNN for the HF correction, propagating and integrating model uncertainty via Gauss-Hermite quadrature and HMC inference (Kerleguer et al., 2023).

2.3 Surrogates with Domain/Mesh/Output Heterogeneity

Heterogeneous input spaces are reconciled by learning linear (or nonlinear) maps between higher- and lower-dimensional input spaces, allowing joint AR1 co-kriging on a common domain (Menon et al., 2024). For functional outputs differing in mesh resolution or structure, methods employ dimensionality reduction to encode outputs to latent spaces, with fusion, corrective, or mapping surrogates operating therein (Brunel et al., 2024). Tensor kernel GP surrogates and co-kriging for basis coefficients extend this to time-series and field outputs (Kerleguer, 2021).

3. Training, Adaptive Sampling, and Active Learning

Multi-fidelity surrogate training typically uses sequential or all-at-once (for DAG/fusion schemes) optimization. GP-based models maximize the log-marginal likelihood, possibly via block or sparse methods to handle large sample sets (Ravi et al., 2024). Neural architectures rely on stochastic gradient optimization (Adam, Adamax), regularization, and cross-validation for hyperparameter tuning (Conti et al., 15 Oct 2025, Conti et al., 2022). For deep Bayesian models, posterior sampling (e.g., HMC, NUTS), or variational inference is used to quantify and calibrate predictive uncertainty (Kerleguer et al., 2023).

Practical active learning algorithms are often embedded in optimization or UQ loops to sequentially select the most informative new data points at an optimal fidelity, using uncertainty-based acquisition (lower confidence bound) or information gain metrics, and benefit-cost trade-offs that account for simulation cost differentials (Chakroborty et al., 2022, Pellegrini et al., 2022). Adaptive sample allocation guided by multilevel Monte Carlo variance/cost minimization ensures optimal split between LF and HF evaluations (Song et al., 2021).

4. Comparative Performance and Benchmarking

Across synthetic functions, PDE benchmarks, and industrial applications (e.g., air-pollution, melt pool geometry, aerodynamic shape, climate emulation), multi-fidelity surrogates consistently outperform single-fidelity approaches for a fixed computational budget (Conti et al., 15 Oct 2025, Brunel et al., 2024, Ravi et al., 2024, Menon et al., 2024). Progressive neural surrogates reduce relative error by up to an order of magnitude when integrating additional (multi-modal) data streams (Conti et al., 15 Oct 2025). In rare-event estimation, multi-fidelity surrogates within subset simulation can reduce the number of HF simulations by two orders of magnitude compared to HF-only approaches, with negligible loss in final accuracy (Chakroborty et al., 2022). Uncertainty-aware and ensemble approaches provide well-calibrated prediction intervals with empirically controlled coverage (Giannoukou et al., 2024, Kerleguer et al., 2023). In the data-scarce regime, all-at-once DAG surrogates and neural residual methods demonstrate order-of-magnitude gains over hierarchical discrepancy chains by efficiently aggregating information from non-nested and non-peered sources (Gorodetsky et al., 2020, Niu et al., 2024).

5. Advantages, Limitations, and Open Challenges

Advantages of multi-fidelity surrogates include: (i) dramatic reduction in the number of expensive HF evaluations for a fixed error; (ii) applicability to arbitrary or heterogeneous data modalities; (iii) online flexibility—robust prediction is feasible given only a subset of available data streams; (iv) explicit or ensemble-based uncertainty quantification; (v) mitigation of catastrophic forgetting via residual or progressive training (Conti et al., 15 Oct 2025, Nasr et al., 24 Feb 2026).

Key limitations and challenges are as follows: (i) hyperparameter tuning for complex surrogates remains nontrivial and often problem-dependent; (ii) synchronization and alignment of heterogeneous datasets are required; (iii) interpretability is limited for deep/black-box surrogates; (iv) offline training (especially for deep ensembles) can be expensive; (v) current methods may degrade if LF/HF correlation is weak or nonlinearities are extreme. For high-dimensional outputs, tensorized covariance and manifold alignment remain active areas of research (Brunel et al., 2024).

Extensions under active investigation include: integration of fully Bayesian decoding for calibration-free uncertainty quantification, handling arbitrary patterns of missing modalities through attention-like mechanisms, embedding physics-informed constraints (PINNs) for physical consistency, and coupling surrogates to inverse or design optimization loops (Conti et al., 15 Oct 2025, Niu et al., 2024).

6. Illustrative Algorithms and Architectures

6.1 Progressive Multi-Fidelity Surrogate—Block Diagram

Imagine a horizontal stack of $K+1$ blocks, each with an encoder for its input modality. All encoders output latent codes to a vertical bus so that the $k^\text{th}$ block receives all codes up to level $k$ ; the decoder at each level outputs a residual, which is additively summed with previous outputs for the final prediction. Earlier encoder/decoder weights are frozen as levels progress, mitigating knowledge degradation (Conti et al., 15 Oct 2025).

6.2 Training Pseudocode for Progressive Surrogates

for k in range(K+1):
    freeze_previous_layers(k)
    initialize_current_layer(k)
    for epoch in range(max_epochs):
        for batch in data[k]:
            # Encode all fidelities
            Z = [Phi[l](x[l]) for l in range(k+1)]
            # Decode residual
            delta = Psi[k](concat(Z))
            # Add to previous surrogate
            y_pred = f_prev(x[:k]) + delta
            # Compute loss + L2 regularization
            loss = MSE(y_pred, y_true) + lambda_Phi * ||W_Phi[k]||^2 + lambda_Psi * ||W_Psi[k]||^2
            # Backpropagate and update current W_Phi, W_Psi
    update_surrogate(k)

(Conti et al., 15 Oct 2025)

6.3 All-at-Once Multifidelity DAG Optimization

The loss is a nonlinear least squares term over all sources, each node’s output a function of its parents and a local bias. Forward and backward sweeps efficiently propagate feature evaluations and aggregated residuals through the graph for Gauss–Newton or gradient-based optimization (Gorodetsky et al., 2020).

7. Representative Benchmark Results

Problem	Progressive MF (Rel. Err)	All-at-once DAG MF (Rel. Err)	Baseline HF Only
Reaction–Diffusion	18.6%	$\sim$ 1e-2 (DAG, 3 pts)	81.3% (params)
Navier–Stokes	6.50%	—	12.3% (params)
Air Pollution	13%	—	72% (temp)

Empirical studies repeatedly show multi-fidelity surrogates yielding order-of-magnitude error reductions and extreme computational savings, particularly when the correlation between LF and HF is moderately strong and the LF domain covers the input space of interest (Conti et al., 15 Oct 2025, Giannoukou et al., 2024, Chakroborty et al., 2022, Gorodetsky et al., 2020).

References:

"Progressive multi-fidelity learning for physical system predictions" (Conti et al., 15 Oct 2025)
"MFNets: Data efficient all-at-once learning of multifidelity surrogates as directed networks of information sources" (Gorodetsky et al., 2020)
"A survey on multi-fidelity surrogates for simulators with functional outputs: unified framework and benchmark" (Brunel et al., 2024)
"Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling" (Niu et al., 2024)
"MAST: A Multi-fidelity Augmented Surrogate model via Spatial Trust-weighting" (Nasr et al., 24 Feb 2026)
"Worst-Case Learning under a Multi-fidelity Model" (Foucart et al., 2024)
"Multi-fidelity surrogate with heterogeneous input spaces for modeling melt pools in laser-directed energy deposition" (Menon et al., 2024)
"Uncertainty-aware multi-fidelity surrogate modeling with noisy data" (Giannoukou et al., 2024)
"General multi-fidelity surrogate models: Framework and active learning strategies for efficient rare event simulation" (Chakroborty et al., 2022)
"Multi-fidelity surrogate modeling using long short-term memory networks" (Conti et al., 2022)
"Multi-fidelity reduced-order surrogate modeling" (Conti et al., 2023)
"A support vector regression-based multi-fidelity surrogate model" (Shi et al., 2019)
"Multi-fidelity surrogate modeling for time-series outputs" (Kerleguer, 2021)
"Transfer Learning on Multi-Fidelity Data" (Song et al., 2021)
"Multi-Fidelity Surrogate Based on Single Linear Regression" (Zhang et al., 2017)