Surrogate-Based Bayesian Optimization

Updated 10 April 2026

Surrogate-based Bayesian optimization is a data-efficient method that models expensive black-box functions using Gaussian processes.
It iteratively selects new evaluation points via acquisition functions such as Expected Improvement and Upper Confidence Bound to balance exploration and exploitation.
Recent advancements include high-dimensional and multi-fidelity extensions that improve optimization in noisy and complex engineering applications.

Surrogate-based Bayesian optimization is a probabilistically principled, data-efficient methodology for solving expensive black-box optimization problems, where direct evaluation of the objective or constraint functions is computationally demanding, such as in chemical engineering simulations or high-fidelity experimental systems. The core paradigm leverages an inexpensive surrogate—most commonly a Gaussian process (GP)—to model the underlying function, and iteratively selects new points to evaluate via an acquisition function that quantifies the value of information gained from additional samples. Bayesian optimization has become a canonical approach within process systems engineering and other scientific domains for optimizing reactor designs, catalyst screening, and flowsheet optimization, where a single function evaluation can require minutes to hours of simulation or laboratory time (Neufang et al., 2024).

1. Mathematical Foundations of Surrogate-Based Bayesian Optimization

The principal component of Bayesian optimization is the use of a surrogate probabilistic model, typically a GP, to approximate the true objective $f:\mathcal{X}\subset\mathbb{R}^d\rightarrow\mathbb{R}$ . The GP prior is given as

$f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$

where $m(x)$ is the mean function and $k(x,x')$ is a symmetric positive-definite kernel (e.g., squared-exponential, Matérn-5/2). Given $n$ noisy observations $\{(x_i, y_i)\}_{i=1}^n$ with $y_i = f(x_i) + \varepsilon_i$ , $\varepsilon_i\sim \mathcal{N}(0,\sigma_n^2)$ , the posterior mean and variance at any $x$ are derived in closed form as

$\mu_n(x) = m(x) + k(x,X)[K+\sigma_n^2 I]^{-1}(y - m(X)), \quad \sigma_n^2(x) = k(x,x) - k(x,X)[K+\sigma_n^2 I]^{-1}k(X,x),$

where $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 0, $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 1 is the kernel matrix, and $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 2 is the vector of observations. This formulation provides not only an interpolative mean predictor, but also a posterior variance that quantifies epistemic uncertainty arising from finite, targeted sampling (Neufang et al., 2024, Chiappetta et al., 4 Feb 2026).

2. Acquisition Functions and Sequential Design

The selection of new evaluation points is governed by an acquisition function $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 3 that explicitly quantifies the expected utility of evaluating $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 4 at $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 5. Two leading acquisition functions are:

Expected Improvement (EI): Favors points with high probability of improving over the current best observation, balancing mean and uncertainty. EI is given by

$f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 6

where $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 7, $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 8 and $f(x) \sim \mathcal{GP}\bigl(m(x),\,k(x,x')\bigr),$ 9 are the standard normal CDF and PDF, and $m(x)$ 0 (Neufang et al., 2024, Chiappetta et al., 4 Feb 2026).

Upper Confidence Bound (UCB): Trades off exploitation and exploration via a direct mean-plus-variance criterion:

$m(x)$ 1

with $m(x)$ 2 controlling the degree of exploration (Neufang et al., 2024).

At each iteration, the algorithm optimizes the acquisition over the domain—often with multi-start local search or global optimization—selects the maximizer, evaluates the true expensive $m(x)$ 3 (or constraints), and updates the surrogate with the new data (Neufang et al., 2024, Chiappetta et al., 4 Feb 2026).

3. Advanced Variants: High-Dimensional and Multi-Fidelity Extensions

Scaling Bayesian optimization to higher dimensions ( $m(x)$ 4) challenges the expressivity and computational tractability of global GP surrogates, due to the GP’s tendency toward oversmoothing and rapid growth in matrix inversion cost ( $m(x)$ 5). The TuRBO (Trust-Region Bayesian Optimization) algorithm addresses these problems by partitioning the input space into multiple local trust-regions, each fitted with an independent local GP, and adaptively resizing regions based on observed success or failure of recent steps. Empirical studies have established TuRBO’s superior convergence in high-dimensional and noisy settings (Neufang et al., 2024).

Other surrogate extensions include ensemble methods (e.g., boosted tree surrogates such as ENTMOOT, radial basis function surrogates like RBFOpt), which offer advantages for categorical/mixed variables or where kernel-based uncertainty quantification is less central, but typically underperform GP-based BO on smooth continuous tasks where uncertainty quantification drives exploration (Neufang et al., 2024).

4. Implementation Workflow and Practical Considerations

The canonical surrogate-based Bayesian optimization workflow is as follows:

$k(x,x')$ 2

Key best practices:

Always scale inputs to $m(x)$ 6 and normalize outputs.
Re-estimate kernel hyperparameters every 10–20 samples via marginal likelihood maximization.
For constrained or noisy settings, use small exploration parameters in EI or tune $m(x)$ 7 in UCB to avoid over-exploration (Neufang et al., 2024).
For expensive multi-objective tasks, combine acquisition functions (e.g., weighted EI) as in multi-criteria process parameter optimization (Kronenwett et al., 30 Jul 2025).
For very limited evaluation budgets, incorporate search space refinement to eliminate unlikely regions before BO begins, which markedly improves efficiency (Nomura et al., 2019).

5. Representative Algorithmic Benchmarks and Case Studies

Applications in chemical and process systems engineering demonstrate the efficacy of surrogate-based BO:

Heat Exchanger Network Synthesis (8D, 3 constraints): GP surrogate with Matérn-5/2 kernel and EI acquisition systematically found a Pareto-optimal design in less than 80 evaluations (out of 100), outperforming baselines by 15% in utility cost, with a mean EI gap below 2% after 50 samples (Neufang et al., 2024).
Catalytic Reactor Optimization (6D, stochastic, noisy): TuRBO with batch UCB led to a threefold speedup in convergence, attaining a mean conversion increase from 75% to 88% in only 100 evaluations, with adaptive trust-region sizes improving balance between global and local optimization (Neufang et al., 2024).
Sensor-based sorting system parameterization (3D, multi-objective, noisy): Multi-objective GP surrogates and combined EI acquisition tuned three process parameters to near-optimal accept/reject accuracy in only $m(x)$ 815 experiments out of a possible 250, while explicitly modeling measurement noise (Kronenwett et al., 30 Jul 2025).
Hyperparameter optimization:
- Search-space refinement boosted performance of BO under very low budgets across neural and tree-based models, outperforming TPE and SMAC on standard benchmarks (Nomura et al., 2019).

Application	Surrogate	Acquisition	Dim.	Budget	Key Outcome
Heat Exchanger Network	GP (Matérn)	EI	8	100	15% cost saving, $m(x)$ 980 evals to optimum
Catalytic Reactor (noisy)	TuRBO (GP)	batch UCB	6	200	3x faster, conversion: 75% $k(x,x')$ 0 88%
Sensor-Based Sorting (multi-objective)	2 GPs	weighted EI/UCB	3	15	$k(x,x')$ 1\% accuracy, 6x fewer evaluations
Hyperparameter Tuning (low-budget)	GP	EI with refine	2–6	10–30	“Refine + BO” outperforms vanilla BO methods

6. Limitations, Challenges, and Recent Directions

Surrogate-based Bayesian optimization is most effective when evaluations are highly expensive, objective/constraint functions are smooth, and uncertainty quantification is critical for guiding exploration. Its performance diminishes as evaluations become cheap relative to surrogate training cost, as function dimensionality increases (particularly globally), or if the objective is nonsmooth/irregular—here alternative surrogate classes (e.g., tree-based, ensemble, or latent variable GP models) are sometimes preferable (Neufang et al., 2024, Bodin et al., 2019).

Recent research areas include:

High-dimensional BO: Trust-region and local surrogate strategies (TuRBO), sparse GPs (inducing point, Vecchia), and effective input scaling (Neufang et al., 2024).
Low-budget settings: Integration of domain pruning/refinement before BO proper (Nomura et al., 2019).
Mixed variable and constraint handling: Ensemble/tree surrogates for categorical/mixed inputs, specialized kernels, and integration of constraint feasibility into acquisition (Neufang et al., 2024).
Multi-objective and noisy optimization: Empirical combination of surrogates, acquisition functions, and noise-aware modeling to efficiently explore multi-objective trade-offs under noise, exemplified by process parameter tuning studies (Kronenwett et al., 30 Jul 2025).

7. Comparative Analysis and Guidelines for Practitioners

Comparative analysis affirms that Gaussian process-based Bayesian optimization is generally preferred for expensive, smooth optimization tasks, outperforming tree-based and RBF-based surrogates in such regimes by leveraging uncertainty quantification for exploration (Neufang et al., 2024). For higher-dimensional, mixed, or discrete-variable settings, ensemble methods, trust-region surrogates, or search space refinement may be required.

Guidelines for effective implementation include:

Scale and normalize all inputs/outputs.
Tune over the kernel class, acquisition function, and their parameters according to dimensionality, noise level, and budget constraints.
For very high-dimensional problems or non-Gaussian noise/nonstationarity, consider local surrogates, sparse GPs, or ensemble models.
In constrained or multi-objective settings, combine acquisitions or surrogates as appropriate to problem demands.
Revisit hyperparameter and model updates as new data are acquired, especially in nonstationary optimization.

Surrogate-based Bayesian optimization, with its efficient allocation of expensive function evaluations, robust mathematical underpinnings, and extensibility, remains integral in tackling complex process systems engineering problems and beyond (Neufang et al., 2024).