Bayesian Optimization Review
- Bayesian Optimization is a probabilistic strategy for optimizing costly, black-box functions by leveraging surrogate models such as Gaussian processes.
- It employs acquisition functions like Expected Improvement, Probability of Improvement, and Upper Confidence Bound to balance exploration and exploitation.
- Recent advances extend BO to high-dimensional, cost-aware, multi-objective, and preference-based applications, enabling scalable, efficient optimization.
Bayesian Optimization (BO) is an advanced probabilistic approach for global optimization of black-box functions that are expensive or time-consuming to evaluate. It is grounded in the design of sequential decision procedures that leverage surrogate models—most commonly Gaussian processes (GPs)—to balance exploration and exploitation, thereby reducing the number of costly evaluations required to identify an optimum. Recent developments in BO encompass extensions to multi-objective, structured, cost-sensitive, high-dimensional, and preference-based settings, as well as novel integration of expert knowledge and alternative surrogates. This article provides an in-depth technical review of BO, its foundational principles, state-of-the-art methodological advances, and practical considerations in both academic and industrial applications.
1. Mathematical Formulation and Principle Surrogate Models
The canonical BO problem seeks
where is an expensive, non-differentiable black-box objective. Evaluations yield with noise .
BO places a Bayesian prior—typically a zero-mean GP—over the unknown : Common kernel choices include squared-exponential and Matérn, often with Automatic Relevance Determination (ARD). After observations , the GP posterior at yields: where is the observed kernel matrix.
Surrogate selection is critical: extensions include Bayesian neural networks (BNNs), random forests, and process-based models for combinatorial or categorical domains (Naveiro et al., 19 Jan 2024, Neiswanger et al., 2019).
2. Acquisition Functions and Trade-Offs
Acquisition functions guide sampling by quantifying information gain or improvement:
- Expected Improvement (EI):
- Probability of Improvement (PI):
- Upper Confidence Bound (UCB):
These functions are optimized each iteration to propose new candidates. Marginalization of GP hyperparameters via fully Bayesian approaches (e.g., MCMC; FBBO) can improve performance, particularly when paired with EI and ARD kernels (Ath et al., 2021).
Recent advances address cost-aware objectives (Lee et al., 2020), batch sampling, multi-objective scalarizations (Tran et al., 2020), and mutual-information/entropy for preference or binary data (Fauvel et al., 2021).
3. Algorithmic Workflow and Implementation Strategies
The core BO loop follows:
1 2 3 4 5 6 7 |
1. Initialize dataset D₀ with n₀ samples.
2. For t = n₀+1,…,N:
a. Fit surrogate model (GP or alternative) to D_{t-1}.
b. Select x_t = argmax_x α(x | D_{t-1}) using chosen acquisition.
c. Evaluate y_t = f(x_t) + noise.
d. Update D_t = D_{t-1} ∪ {(x_t, y_t)}.
3. Return x yielding lowest/optimal observed y. |
Hyperparameters (e.g., GP kernel, acquisition type, batch size) are inferred from data or periodically re-optimized. Model fitting and acquisition optimization are computational bottlenecks, scaling as (GP) and depending on the dimension .
Key implementation points include parallelization (especially for batch/fidelity extensions), constrained optimization (for feasible regions), surrogate replacement (for discrete, categorical spaces), and multi-fidelity augmentation (Paulson et al., 29 Jan 2024, Neiswanger et al., 2019).
4. Extensions: High-Dimensional, Cost-Aware, Multi-Objective, Preference-Based BO
High-Dimensional BO
Standard GP-BO degrades rapidly for due to the curse of dimensionality. Overcoming this employs:
- Low-dimensional embeddings (PCA-BO, KPCA-BO, feature-mapped GP with joint decoder (Moriconi et al., 2019, Antonov et al., 2022)).
- Trust-region local modeling (TuRBO) (Santoni et al., 2023).
- Sparsity-exploring methods (SEBO with L₀ homotopy, multi-objective Pareto-driven frameworks) (Liu et al., 2022).
- Dimension scheduling for parallel, subspace-based updates (Ulmasov et al., 2015).
Cost-aware BO
Accounting for variable evaluation costs (e.g., time, wall-clock budget, energy), CArBO integrates cost surrogates and decaying cost penalties into the acquisition (Lee et al., 2020). Preliminaries include cost-efficient initial design and batch cost-cool acquisition scaling.
Preference-based and Discrete Feedback
Preference-elicitation and ranking-based surrogates are increasingly deployed in settings where pairwise or ordinal feedback is more reliable or practical than direct measurements. Notable frameworks:
- Siamese BNNs + Active Learning for expert integration (Huang et al., 2022).
- Poisson Process BO (PoPBO) modeling ranks via nonhomogeneous Poisson process, with tailored acquisition functions (Rectified LCB, Expected Ranking Improvement), demonstrating superior noise robustness and scalability (Wang et al., 5 Feb 2024).
- Mutual-information–based acquisition for binary/preferential data (Fauvel et al., 2021).
Multi-objective and Constrained Optimization
BO for multi-objective problems relies either on scalarizations (e.g., regularized Tchebycheff (Tran et al., 2020)) or builds independent surrogate GPs for each objective plus Pareto-diversity terms. Acquisition functions are extended to optimize hypervolume improvement and Pareto-frontier spread. Hidden and known constraints are modeled via separate feasibility surrogates or probabilistic classifiers.
5. Practical Surrogate and Acquisition Selection
Table: Surrogate Model & Acquisition Function Preference by Setting
| Scenario | Surrogate Model | Acquisition Function |
|---|---|---|
| Standard continuous | GP, ARD kernel | EI/UCB/PI |
| High-dimensional, low rank | PCA-BO, KPCA-BO | EI in latent z-space |
| Discrete/categorical | Random forest, BNN | MC-based EI/UCB |
| Preference/ranking | Siamese BNN, Poisson | Mutual info, ERI, R-LCB |
| Cost-sensitive evaluation | Dual GP (cost/obj) | Cost-cooled EI/PU |
| Multi-objective/constrained | 3x GP, MOGP | Composite, hypervolume |
Selection depends on the problem structure, scalable compute resources, regularization, interpretability constraints (e.g., sparsity), and feedback modality.
6. Empirical Evaluation and Impact
Empirical results across simulated and real-world benchmarks consistently demonstrate:
- Substantial reductions in wall-clock cost, evaluation count, or convergence time by employing cost-aware, high-dimensional, preference-based, and expert-augmented BO (Huang et al., 2022, Lee et al., 2020, Antonov et al., 2022, Wang et al., 5 Feb 2024).
- Robustness to noise and model misspecification by ranking-based surrogates (PoPBO), classifier-based EI estimation (BORE), and preference-based active learning strategies (Wang et al., 5 Feb 2024, Tiao et al., 2021).
- Accelerated Pareto-frontier discovery for multi-objective design applications via multi-GP and scalarized acquisition frameworks (Tran et al., 2020).
- Scalability to hundreds of dimensions using surrogate replacements, embedding, and subspace optimization (Santoni et al., 2023, Ulmasov et al., 2015, Liu et al., 2022).
- Transferable algorithmic principles to domains such as additive manufacturing, hyperparameter tuning, experimental design, and recommendation systems (Zhang et al., 2021, Lee et al., 2020, Liu et al., 2022).
7. Challenges, Limitations, and Ongoing Research Directions
Key technical challenges include:
- Scalability in input dimension, sample count, and surrogate fidelity.
- Surrogate selection and model misspecification, especially as non-Gaussian, nonstationary, or heteroscedastic effects arise.
- Acquisition-optimization overhead for high-dimensional, non-convex, or mixed-variable spaces.
- Active incorporation of expert knowledge or preference feedback without biasing the search, as studied in Siamese BNN architectures (Huang et al., 2022).
- Unified frameworks capable of robustly handling multi-fidelity, multi-objective, constrained, and preference-based settings simultaneously.
Recent work emphasizes the principled integration of full Bayesian hyperparameter marginalization, flexible surrogate architectures, Pareto-/hypervolume-aware acquisitions, and efficient parallel/batch settings. There is growing interest in simulation-based, probabilistic programming-driven, and multi-modal local-optima identification (Neiswanger et al., 2019, Mei et al., 2022).
The field is progressing toward theoretically grounded, computationally tractable algorithms that retain sample efficiency and applicability to increasingly complex, noisy, and structurally varied optimization landscapes.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free