Multi-fidelity Bayesian Optimization

Updated 19 August 2025

Multi-fidelity Bayesian Optimization is a sample-efficient strategy that integrates data from diverse fidelity sources to optimize expensive black-box functions while balancing cost and accuracy.
It employs unified surrogate models such as additive Gaussian Processes and deep autoregressive networks to jointly model high- and low-fidelity outputs.
Applications include engineering design, hyperparameter tuning, and scientific discovery, often yielding significant cost reductions and improved performance compared to single-fidelity methods.

Multi-fidelity Bayesian Optimization (MFBO) is a sample-efficient optimization strategy for expensive black-box functions, combining information sources of varying cost and accuracy to rapidly converge to an optimum while minimizing resource expenditure. This paradigm leverages correlated but distinct information sources—such as high- and low-fidelity physical models, simulators with different mesh resolutions, or partial experimental results—through unified surrogate models and cost-aware acquisition functions. The resulting frameworks underpin many advances in engineering design, machine learning hyperparameter tuning, scientific discovery, and simulation-based optimization where the trade-off between information quality and evaluation cost is critical.

1. Modeling Structure and Multi-Fidelity Surrogates

Central to MFBO is the integration of data from multiple information sources, each characterized by a different fidelity level, cost, and statistical discrepancy relative to the target function. Surrogate models are constructed to jointly model these outputs and to efficiently propagate information from inexpensive sources to enhance knowledge of the true (highest-fidelity) objective.

A foundational modeling approach is the additive multi-fidelity Gaussian Process (GP) model. Given a set of $m$ fidelities with target index $m$ , the model typically assumes:

$f_\ell(x) = f_m(x) + \delta_\ell(x), \qquad \ell < m$

where $f_m(x)$ is the target/highest-fidelity function and $\delta_\ell(x)$ captures systematic bias or discrepancy. The multi-output GP formulation with shared latent structure (i.e., $f_m$ appearing in all fidelities) enables the Bayesian optimizer to update all fidelities' posterior distributions upon observing any $(x, \ell)$ pair, thus coupling the learning process across fidelities (Song et al., 2018).

Other surrogate architectures include:

Linear Model of Coregionalization (LMC): Expresses all outputs as linear combinations of shared latent GPs.
Autoregressive (Kennedy–O’Hagan, KOH) models: Hierarchically model higher-fidelity outputs by regressing on outputs at the next lower fidelity with an independent GP discrepancy.
Deep and auto-regressive neural networks: Stack neural networks for each fidelity, each consuming the outputs of lower-fidelity levels as input, enabling the flexible capture of nonlinear, nonstationary, or strongly coupled inter-fidelity relationships (Li et al., 2020, Li et al., 2021).

For binomial or categorical outputs, generalized or non-Gaussian surrogates are employed, often necessitating approximate inference (e.g., Laplace approximation) to accommodate non-standard likelihoods (Matyushin et al., 2019).

2. Acquisition Functions, Cost Awareness, and Query Strategies

The acquisition function in MFBO must jointly decide where to query (the input $x$ ) and at what fidelity $\ell$ , balancing immediate utility, uncertainty reduction, and resource cost.

A cost-sensitive mutual information gain is a common acquisition approach:

$\text{Benefit--Cost Ratio} = \frac{I(y_{x,\ell}; f_m \mid \mathcal{D})}{\lambda_\ell}$

Here $I(y_{x,\ell}; f_m \mid \mathcal{D})$ is the expected reduction in entropy (uncertainty) about the target objective $f_m$ from observing $y_{x,\ell}$ , and $\lambda_\ell$ is the cost to sample fidelity $\ell$ . The query with highest benefit–cost ratio is selected, and exploration continues at lower fidelities until it is no longer cost-effective relative to upgrading to a higher-fidelity query (Song et al., 2018).

Information-theoretic acquisition functions such as Max-value Entropy Search (MES) are efficiently extended to MFBO. The mutual information between the unknown optimum value $f^*$ and a candidate observation at fidelity $m$ becomes

$a(x, m) = \frac{I(f^*; f^{(m)}(x) \mid \mathcal{D})}{\lambda^{(m)}}$

For MF-MES, nearly all required computations are analytic, except for a manageable one-dimensional integral when $m$ is not the highest fidelity, preserving computational efficiency (Takeno et al., 2019).

Other strategies include:

Purely cost-normalized acquisition functions: $acqf(x, \ell) = acqf(x)/\text{cost}(\ell)$ , where $acqf(x)$ is EI, MES, or similar (Sabanza-Gil et al., 1 Oct 2024).
Source-specific utility: Exploration incentives for LF sources (using expected improvement or uncertainty), and exploitation (using probability of improvement) for HF sources, to mitigate bias propagation from LF optima (Foumani et al., 2022).
Proximity-based criteria: Use high-fidelity acquisition to select $x_t$ , but query a lower-fidelity source if the region is under-sampled at that level, thus controlling HF sampling via neighborhood density (Manoj et al., 1 Aug 2025).
Batch and parallel acquisition extensions, where set diversity and joint mutual information are optimized (Li et al., 2021).

3. Regret Notions, Performance Bounds, and Theoretical Guarantees

MFBO algorithms introduce regret measures that account for both the optimization gap and cumulative cost of all queries, distinguishing them from classical BO.

In an “episode” (a round ending with a target-fidelity query), regret is defined by:

$r(\mathcal{E}) = \left(\frac{\text{Total Cost of } \mathcal{E}}{\lambda_m}\right) f^*_m - f_m(x_t)$

and accumulated across episodes. Under appropriate statistical assumptions, theoretical results characterize cumulative regret in terms of mutual information accrued at both LF and HF queries:

$R(\text{MF-MI-Greedy}, B) \leq C_1 \sqrt{\gamma_m} + C_2 \alpha \sqrt{\gamma_\ell}$

with information gains $\gamma_m$ , $\gamma_\ell$ measured at target and auxiliary fidelities, respectively, and $\alpha$ tied to exploration-stopping thresholds (Song et al., 2018). The framework is “no-regret” as total cost budget $B \to \infty$ , assuming informative lower fidelities.

Other algorithms supply similar theoretical safeguards:

Upper bounds on the regret of robust MFBO (“no harm” theorems) relative to single-fidelity baselines, regardless of inclusion of unreliable sources (Mikkola et al., 2022).
Tight control of the trade-off between current and future tasks by introducing acquisition terms for transferable knowledge in longitudinal multi-fidelity learning sequences (Zhang et al., 14 Mar 2024).

4. Robustness, Local Correlation, and Practical Limitations

A substantial body of recent research highlights pitfalls when incorporating low-fidelity data. In practical settings, key assumptions made by classical MFBO may be violated:

LF and HF sources can be only locally correlated—not globally—causing naive fusion to degrade optimization (Foumani et al., 2023).
Error characteristics (e.g., noise variance or bias) vary sharply across sources, invalidating a unified noise model.

Robust methodologies introduce mechanisms to:

Model source-dependent noise using latent map GPs (LMGPs), supporting heterogeneous error structures and adaptive source selection (Foumani et al., 2023, Foumani et al., 2022).
Dynamically exclude or downweight highly biased LF sources, either by pre-screening in the latent fidelity manifold or by in-loop penalization in the acquisition function (Foumani et al., 2022).
Implement reliability safeguards, such as fallback to single-fidelity optimization when auxiliary source informativeness or relevance thresholds are not met (Mikkola et al., 2022).

These advances allow MFBO to maintain performance or even accelerate convergence despite unreliable, biased, or nonstationary lower-fidelity sources, and have enabled the robust inclusion of non-traditional information providers, including human experts.

5. Applications and Empirical Findings

MFBO frameworks have been evaluated on diverse synthetic and real-world benchmarks:

Engineering design and scientific computing: Applications include optimization of nanophotonic devices, aerodynamic shapes, mechanical vibration plates, and reactor geometries, typically using simulators at multiple mesh resolutions or fidelities (Song et al., 2018, Savage et al., 2022, Li et al., 2020, Shahrooei et al., 2022).
Materials and molecular discovery: MFBO expedites the search for promising molecules/materials by integrating experimental measurements and simulation data; effectiveness critically depends on LF source informativeness (measured by $R^2$ correlation) and cost ratio $\rho$ (Sabanza-Gil et al., 1 Oct 2024, Judge et al., 11 Sep 2024).
Hyperparameter/architecture optimization: MFBO accelerates neural network selection by combining train/validation curves at early epochs (LF) and full-training (HF), employing frameworks such as BOHB that merge Bayesian optimization with resource allocation strategies like HyperBand (Lindauer et al., 2019).
Falsification and safety evaluation: For learning-based control systems, MFBO enables cost-effective simulation-based falsification, reliably surfacing counterexamples while reducing the number of high-fidelity simulations needed (Shahrooei et al., 2022).

A consistent outcome is that MFBO substantially reduces sampling costs—sometimes by factors exceeding 2x—while maintaining, or in well-aligned scenarios, improving solution quality compared to single-fidelity BO, when LF sources are sufficiently accurate and inexpensive. Under adverse LF cost/informativeness conditions, MFBO can lose its practical advantage, sometimes performing worse than single-fidelity search (Sabanza-Gil et al., 1 Oct 2024).

6. Extensions, Open Challenges, and Future Directions

Recent developments in MFBO extend the paradigm across several axes:

Multi-task and transfer-aware MFBO: Acquisition functions are proposed that explicitly trade off between optimizing current task performance and collecting transferable knowledge for future, related optimization tasks, leveraging variational Bayesian parameter transfer (Zhang et al., 14 Mar 2024).
Constrained and cost-aware settings: New CMFBO frameworks enable constraint modeling (including source-dependent, possibly black-box constraints), automatic source selection, and systematic, data-driven stopping criteria for convergence assessment (Foumani et al., 3 Mar 2025).
Multi-scale and multi-dimensional fidelity: Modern ML workloads (e.g., LLM pre-training) are handled via joint Bayesian optimization across data mixtures, model scales, and training steps, requiring kernel designs and acquisition functions that operate over simplex, discrete, and continuous spaces simultaneously (Yen et al., 26 Mar 2025).
Physics-aware acquisition and domain knowledge integration: Embedding physical priors or expert-driven utility factors into the acquisition function guides high-fidelity queries towards levers of physical complexity or risk, further improving sample efficiency (Fiore et al., 2023).

Open research challenges include: scalable surrogate and acquisition modeling in high-dimensional, multi-fidelity settings; dynamically adapting to heterogeneous noise and local correlation structure; multi-objective or robust (distributional) optimization; and establishing practical, cost-benefit decision rules as part of experimental planning workflows. Guidelines suggest that MFBO is most advantageous when the LF experiments are at least five times cheaper than HF and deliver $R^2 > 0.75$ correlation with the target (Sabanza-Gil et al., 1 Oct 2024).

7. Representative Mathematical Expressions and Tables

Concept	Mathematical Formulation	Description
Additive GP Model	$f_\ell(x) = f_m(x) + \delta_\ell(x)$	LF as sum of HF and discrepancy
Information Gain	$I(y_{x,\ell}; f_m \| \mathcal{D}) = H(y_{x,\ell} \| \mathcal{D}) - H(y_{x,\ell} \| f_m, \mathcal{D})$	Informativeness of query
Acquisition w/Cost	$a(x,\ell) = \frac{I(f^*; f^{(\ell)}(x) \mid \mathcal{D})}{\lambda_\ell}$	MES-based cost-sensitive acquisition
Regret (episode)	$r(\mathcal{E}) = (\text{Total Cost}/\lambda_m) \cdot f_m^* - f_m(x_t)$	Cost-aware performance metric
Robust Criterion	Accept MFBO query only if (i) predicted target variance $\leq c_1$ and (ii) source informativeness $>c_2$	Prevents harm from poor LF sources

These expressions and criteria, employed throughout recent literature on MFBO, capture the core balancing of information, cost, and robustness that define the state of the art.

This entry synthesizes advances in surrogate modeling, cost-sensitive information metrics, empirical and theoretical analyses, robust selection mechanisms, and cross-domain applications, providing a complete technical reference on multi-fidelity Bayesian optimization as reflected in the current arXiv literature (Song et al., 2018, Takeno et al., 2019, Li et al., 2020, Foumani et al., 2023, Foumani et al., 2022, Sabanza-Gil et al., 1 Oct 2024, Manoj et al., 1 Aug 2025).