Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 96 tok/s
Gemini 3.0 Pro 48 tok/s Pro
Gemini 2.5 Flash 155 tok/s Pro
Kimi K2 197 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Bayesian Optimization Review

Updated 15 November 2025
  • Bayesian Optimization is a probabilistic strategy for optimizing costly, black-box functions by leveraging surrogate models such as Gaussian processes.
  • It employs acquisition functions like Expected Improvement, Probability of Improvement, and Upper Confidence Bound to balance exploration and exploitation.
  • Recent advances extend BO to high-dimensional, cost-aware, multi-objective, and preference-based applications, enabling scalable, efficient optimization.

Bayesian Optimization (BO) is an advanced probabilistic approach for global optimization of black-box functions that are expensive or time-consuming to evaluate. It is grounded in the design of sequential decision procedures that leverage surrogate models—most commonly Gaussian processes (GPs)—to balance exploration and exploitation, thereby reducing the number of costly evaluations required to identify an optimum. Recent developments in BO encompass extensions to multi-objective, structured, cost-sensitive, high-dimensional, and preference-based settings, as well as novel integration of expert knowledge and alternative surrogates. This article provides an in-depth technical review of BO, its foundational principles, state-of-the-art methodological advances, and practical considerations in both academic and industrial applications.

1. Mathematical Formulation and Principle Surrogate Models

The canonical BO problem seeks

x=argminxXf(x)x^* = \arg\min_{x\in\mathcal{X}} f(x)

where f:XRf: \mathcal{X}\rightarrow\mathbb{R} is an expensive, non-differentiable black-box objective. Evaluations yield yi=f(xi)+εiy_i = f(x_i) + \varepsilon_i with noise εiN(0,σn2)\varepsilon_i \sim \mathcal{N}(0,\sigma_n^2).

BO places a Bayesian prior—typically a zero-mean GP—over the unknown f(x)f(x): f(x)GP(0,k(x,x))f(x) \sim \mathcal{GP}(0, k(x, x')) Common kernel choices include squared-exponential and Matérn, often with Automatic Relevance Determination (ARD). After nn observations Dn={xi,yi}i=1nD_n = \{x_i, y_i\}_{i=1}^n, the GP posterior at xx yields: μ(x)=k(x,X)[K+σn2I]1y,σ2(x)=k(x,x)k(x,X)[K+σn2I]1k(X,x)\mu(x) = k(x,X)[K+\sigma_n^2 I]^{-1}y, \quad \sigma^2(x) = k(x,x) - k(x,X)[K+\sigma_n^2 I]^{-1}k(X,x) where KK is the observed kernel matrix.

Surrogate selection is critical: extensions include Bayesian neural networks (BNNs), random forests, and process-based models for combinatorial or categorical domains (Naveiro et al., 19 Jan 2024, Neiswanger et al., 2019).

2. Acquisition Functions and Trade-Offs

Acquisition functions guide sampling by quantifying information gain or improvement:

  • Expected Improvement (EI):

EI(x)=(μ(x)ybest)Φ(Z)+σ(x)ϕ(Z),Z=(ybestμ(x))/σ(x)\mathrm{EI}(x) = (\mu(x)-y_\text{best})\Phi(Z) + \sigma(x)\phi(Z),\quad Z=(y_\text{best}-\mu(x))/\sigma(x)

  • Probability of Improvement (PI):

PI(x)=Φ((ybestμ(x))/σ(x))\mathrm{PI}(x) = \Phi\bigl((y_\text{best}-\mu(x))/\sigma(x)\bigr)

UCB(x)=μ(x)+βσ(x)\mathrm{UCB}(x) = \mu(x) + \beta\,\sigma(x)

These functions are optimized each iteration to propose new candidates. Marginalization of GP hyperparameters via fully Bayesian approaches (e.g., MCMC; FBBO) can improve performance, particularly when paired with EI and ARD kernels (Ath et al., 2021).

Recent advances address cost-aware objectives (Lee et al., 2020), batch sampling, multi-objective scalarizations (Tran et al., 2020), and mutual-information/entropy for preference or binary data (Fauvel et al., 2021).

3. Algorithmic Workflow and Implementation Strategies

The core BO loop follows:

1
2
3
4
5
6
7
1. Initialize dataset D₀ with n₀ samples.
2. For t = n₀+1,…,N:
    a. Fit surrogate model (GP or alternative) to D_{t-1}.
    b. Select x_t = argmax_x α(x | D_{t-1}) using chosen acquisition.
    c. Evaluate y_t = f(x_t) + noise.
    d. Update D_t = D_{t-1} ∪ {(x_t, y_t)}.
3. Return x yielding lowest/optimal observed y.

Hyperparameters (e.g., GP kernel, acquisition type, batch size) are inferred from data or periodically re-optimized. Model fitting and acquisition optimization are computational bottlenecks, scaling as O(n3)O(n^3) (GP) and depending on the dimension dd.

Key implementation points include parallelization (especially for batch/fidelity extensions), constrained optimization (for feasible regions), surrogate replacement (for discrete, categorical spaces), and multi-fidelity augmentation (Paulson et al., 29 Jan 2024, Neiswanger et al., 2019).

4. Extensions: High-Dimensional, Cost-Aware, Multi-Objective, Preference-Based BO

High-Dimensional BO

Standard GP-BO degrades rapidly for d>15d>15 due to the curse of dimensionality. Overcoming this employs:

Cost-aware BO

Accounting for variable evaluation costs (e.g., time, wall-clock budget, energy), CArBO integrates cost surrogates and decaying cost penalties into the acquisition (Lee et al., 2020). Preliminaries include cost-efficient initial design and batch cost-cool acquisition scaling.

Preference-based and Discrete Feedback

Preference-elicitation and ranking-based surrogates are increasingly deployed in settings where pairwise or ordinal feedback is more reliable or practical than direct measurements. Notable frameworks:

  • Siamese BNNs + Active Learning for expert integration (Huang et al., 2022).
  • Poisson Process BO (PoPBO) modeling ranks via nonhomogeneous Poisson process, with tailored acquisition functions (Rectified LCB, Expected Ranking Improvement), demonstrating superior noise robustness and scalability (Wang et al., 5 Feb 2024).
  • Mutual-information–based acquisition for binary/preferential data (Fauvel et al., 2021).

Multi-objective and Constrained Optimization

BO for multi-objective problems relies either on scalarizations (e.g., regularized Tchebycheff (Tran et al., 2020)) or builds independent surrogate GPs for each objective plus Pareto-diversity terms. Acquisition functions are extended to optimize hypervolume improvement and Pareto-frontier spread. Hidden and known constraints are modeled via separate feasibility surrogates or probabilistic classifiers.

5. Practical Surrogate and Acquisition Selection

Table: Surrogate Model & Acquisition Function Preference by Setting

Scenario Surrogate Model Acquisition Function
Standard continuous GP, ARD kernel EI/UCB/PI
High-dimensional, low rank PCA-BO, KPCA-BO EI in latent z-space
Discrete/categorical Random forest, BNN MC-based EI/UCB
Preference/ranking Siamese BNN, Poisson Mutual info, ERI, R-LCB
Cost-sensitive evaluation Dual GP (cost/obj) Cost-cooled EI/PU
Multi-objective/constrained 3x GP, MOGP Composite, hypervolume

Selection depends on the problem structure, scalable compute resources, regularization, interpretability constraints (e.g., sparsity), and feedback modality.

6. Empirical Evaluation and Impact

Empirical results across simulated and real-world benchmarks consistently demonstrate:

7. Challenges, Limitations, and Ongoing Research Directions

Key technical challenges include:

  • Scalability in input dimension, sample count, and surrogate fidelity.
  • Surrogate selection and model misspecification, especially as non-Gaussian, nonstationary, or heteroscedastic effects arise.
  • Acquisition-optimization overhead for high-dimensional, non-convex, or mixed-variable spaces.
  • Active incorporation of expert knowledge or preference feedback without biasing the search, as studied in Siamese BNN architectures (Huang et al., 2022).
  • Unified frameworks capable of robustly handling multi-fidelity, multi-objective, constrained, and preference-based settings simultaneously.

Recent work emphasizes the principled integration of full Bayesian hyperparameter marginalization, flexible surrogate architectures, Pareto-/hypervolume-aware acquisitions, and efficient parallel/batch settings. There is growing interest in simulation-based, probabilistic programming-driven, and multi-modal local-optima identification (Neiswanger et al., 2019, Mei et al., 2022).

The field is progressing toward theoretically grounded, computationally tractable algorithms that retain sample efficiency and applicability to increasingly complex, noisy, and structurally varied optimization landscapes.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Bayesian Optimization (BO).