Papers
Topics
Authors
Recent
Search
2000 character limit reached

Acquisition Functions in Bayesian Optimization

Updated 31 January 2026
  • Acquisition functions are mappings that assign scores based on a surrogate's posterior means and variances, guiding the balance between exploration and exploitation in Bayesian optimization.
  • They determine the next query point by trading off regions of high uncertainty with areas known to yield promising improvements using methods like EI, PI, and UCB.
  • Recent advances include dynamic ensembles, meta-learned neural acquisition functions, and information-theoretic strategies that enhance optimization efficiency and scalability.

Acquisition functions are central elements in Bayesian optimization (BO), active learning, and Bayesian quadrature: they prescribe where to query next by formalizing the trade-off between exploration (sampling where the surrogate model is uncertain) and exploitation (refining regions known to be promising). Mathematically, an acquisition function is any mapping that assigns to each candidate xx (given current data) a real score, typically derived from the posterior mean and variance of a probabilistic surrogate (e.g., Gaussian process). Their correct design, efficient optimization, adaptation to task structure, and even meta-learning or programmatic synthesis remain active research frontiers. Recent years have seen the emergence of dynamic ensembles, meta-learned neural acquisitions, information-theoretic criteria, high-dimensional variants, and advanced strategies for acquisition maximization. This article provides a comprehensive, rigorously sourced overview.

1. Classical Acquisition Functions: Definitions and Taxonomy

Canonical acquisition functions for Gaussian process–based BO are parametrized by the posterior mean μ(x)\mu(x) and variance σ2(x)\sigma^2(x) computed on a dataset Dt={(xi,yi)}i=1tD_t = \{(x_i, y_i)\}_{i=1}^t. The most widely used forms include:

aUCB(x)=μ(x)+κσ(x)a_{\text{UCB}}(x) = \mu(x) + \kappa \sigma(x)

where κ>0\kappa > 0 controls the exploration–exploitation trade-off (Iwata, 2021).

  • Probability of Improvement (PI):

aPI(x)=Φ(μ(x)y+ξσ(x))a_{\text{PI}}(x) = \Phi\left(\frac{\mu(x) - y^+ - \xi}{\sigma(x)}\right)

with y+=maxiyiy^+ = \max_i y_i, ξ0\xi \geq 0 (jitter), and Φ\Phi the standard normal CDF (Iwata, 2021).

  • Expected Improvement (EI):

aEI(x)=(μ(x)y+ξ)Φ(z)+σ(x)ϕ(z)a_{\text{EI}}(x) = (\mu(x) - y^+ - \xi)\Phi(z) + \sigma(x)\phi(z)

where z=μ(x)y+ξσ(x)z = \frac{\mu(x) - y^+ - \xi}{\sigma(x)} and ϕ\phi is the standard normal PDF (Wilson et al., 2018).

  • Mutual Information (MI):

aMI(x)=I[y;f(x)D]=H[yD]H[yf(x),D]a_{\text{MI}}(x) = I[y; f(x) | D] = H[y|D] - H[y|f(x), D]

For Gaussian likelihood σn2\sigma_n^2, this admits the form 12log(1+σ2(x)/σn2)\frac{1}{2}\log(1 + \sigma^2(x)/\sigma_n^2) (Iwata, 2021).

  • Parametric Generalizations: A continuous one-parameter family recovers PI and EI as special cases:

αp(x)=E[(f(x)y)+p]=y(yy)pN(yμ(x),σ2(x))dy\alpha_p(x) = \mathbb{E}[(f(x) - y_*)_+^p] = \int_{y_*}^\infty (y-y_*)^p\, \mathcal{N}(y|\mu(x),\sigma^2(x))\,dy

with p=0p=0 recovers PI, p=1p=1 gives EI. For p>0p > 0, closed-form expressions involve confluent hypergeometric functions (Kanazawa, 2021).

  • Uncertainty-based AL (Active Learning) Acquisitions: In classification, one uses predictive entropy (MaxEnt), mean class probability standard deviation (MeanSTD), or mutual information with weights (BALD) (Dossou, 2024).

Further developments include information-theoretic criteria such as Max-value Entropy Search (MES) (Wang et al., 15 Feb 2025), and acquisition Thompson sampling (ATS) for batch queries, which induces multiple acquisition functions by sampling the surrogate’s hyperparameters (Palma et al., 2019).

2. Advanced Acquisition Optimization: Theory and Practice

The maximization of acquisition functions per iteration is itself a nonconvex global optimization challenge, and is critical for regret guarantees (Wilson et al., 2018, Kim et al., 2019, Zhao et al., 2023, Xie et al., 2024).

Key principles:

  • Bayes Decision Rule: Choosing x=argmaxxα(x)x^* = \arg\max_x \alpha(x) constitutes the myopic Bayes action.
  • Gradient-Based Maximization: Most acquisition functions can be rewritten as expectations (or integrals) over Gaussians, facilitating reparameterization:

y=μ(x)+σ(x)ϵ,ϵN(0,1)y = \mu(x) + \sigma(x)\epsilon, \quad \epsilon \sim \mathcal{N}(0,1)

This enables unbiased stochastic gradient estimation and efficient use of optimizers such as L-BFGS or Adam (especially in parallel/batch settings) (Wilson et al., 2017, Wilson et al., 2018).

  • Submodularity and Greedy Selection: For batch (q-point) selections, many acquisition functions (PI, EI, UCB) are submodular when formulated as set utilities; sequential greedy maximization delivers a (11/e)(1-1/e) approximation to the global joint optimum (Wilson et al., 2018).
  • Local vs. Global Maximizers: Multi-start local optimization with a moderate number of restarts (N=10N=10–100) yields negligible additional regret compared to global solvers, providing strong empirical and theoretical justification for this practice (Kim et al., 2019).
  • Piecewise-Linear Kernel MIP: Mixed-integer programming with piecewise-linear kernel surrogates enables certifiably global AF maximization and regret bounds, outperforming multi-start heuristics on multimodal landscapes at moderate scale (Xie et al., 2024).
  • Initialization in High Dimension: Heuristic initializations (e.g., via CMA-ES, GA, or by leveraging historical data) can dramatically improve AF maximization in high-dimensions compared to random restarts. Poor initializations lead to pathological over-exploration (Zhao et al., 2023).

3. Learning and Adapting Acquisition Functions

While traditional acquisition functions are handcrafted, a spectrum of contemporary research seeks data-driven, transferable, or adaptive design:

  • Ensembles and Meta-Adaptation: Weighted or dynamically combined ensembles of EI, PI, LCB/UCB, with generator functions (random, cycling, meta-optimized weights), enhance robustness across tasks. Meta-optimization of AF weights as an outer-loop BO problem consistently reduces regret (Merchán et al., 2020, Chen et al., 2022).
  • Switching Schedules: Explicitly switching between explorative (EI) and exploitative (PI or MSP) AFs, either on a preset schedule (e.g., first 25% steps EI, rest PI) or adaptively (switch on local convergence), delivers strong performance across diverse problems (Benjamins et al., 2022, Wang et al., 15 Feb 2025).
  • Meta-Learned Neural AFs: Training neural acquisition policies via reinforcement learning over families of tasks yields parameterized AFs (e.g., fθ[μ,σ,x,t,T]f_\theta[\mu,\sigma,x,t,T]) that adapt to structural regularities, outperforming fixed EI/UCB in transfer and few-shot scenarios (Volpp et al., 2019, Iwata, 2021).
  • LLM-Guided and Programmatic Synthesis: Recent methods employ LLMs and symbolic program search to synthesize novel, interpretable AFs with empirically superior convergence (FunBO). These AFs blend and extend classical forms (EI, UCB, PI) using higher-order rational expressions, CDF shifts, PDF powers, and empirically tuned reweightings; they generalize robustly beyond their training distribution (Aglietti et al., 2024).
Approach Adaptivity Relevant References
Weighted Ensemble Static/Dynamic (Merchán et al., 2020, Chen et al., 2022)
Meta-Learned Neural AF Task-Adaptive (Volpp et al., 2019, Iwata, 2021)
Switch Schedule Stage-Adaptive (Benjamins et al., 2022, Wang et al., 15 Feb 2025)
LLM/Program Search Task-Specific (Aglietti et al., 2024)

4. Multi-Objective, Likelihood-Free, and Domain-Specific AFs

Acquisition function innovation extends into multi-objective, likelihood-free, and domain-specific regimes:

  • Dynamic Multi-objective Ensembles: At each BO iteration, DMEA identifies a triple of best-performing acquisition functions (from a pool of EI, PI, LCBs) based on penalties reflecting their historical success. Batch candidates are then selected by Pareto-optimal evolutionary search and layered preference scores, balancing diversity and expected utility (Chen et al., 2022).
  • Active Learning and Uncertainty: In deep active learning (e.g., for medical imaging), uncertainty measures such as BALD (predictive information gain), maximal entropy, and mean STD are key. Empirical studies confirm BALD’s stability but reveal all such AFs can be myopic under heavy class imbalance (Dossou, 2024).
  • Likelihood-Free AFs: In structured domains (e.g., molecular optimization), density-ratio classifiers replace surrogate-based AFs. Tree-based partitioning with local acquisition functions and LLM/chemistry foundation model priors enables scalable, sample-efficient search over vast, structured spaces (Chen et al., 15 Dec 2025).

5. Information-Theoretic and Bayesian Quadrature Acquisitions

Beyond improvement and confidence-based AFs, information-theoretic approaches play a central role:

  • Mutual Information (MI) and Max-value Entropy Search (MES): GP-MI and MES directly optimize for expected information gain about the location or value of the maximum; they are especially effective when exploration of epistemic uncertainty is crucial (Iwata, 2021, Wang et al., 15 Feb 2025). Adaptive switching between exploitation (MSP) and exploration (MES) phases yields superior performance on high-fidelity, costly problems (Wang et al., 15 Feb 2025).
  • Bayesian Quadrature (BQ) AFs: In model evidence estimation, one-step or prospective AFs maximize pointwise variance (PUQ), aim to reduce posterior or evidence variance contributions (PVC, PLUR, PEUR), with closed-form or efficiently estimated objectives. Empirical benchmarks show that PEUR is generally the most sample-efficient for evidence estimation, while PLUR excels in capturing secondary modes (Song et al., 10 Oct 2025).

6. Empirical Insights, Recommendations, and Limitations

Experimental comparisons consistently show:

Open challenges include the high computational cost of meta-optimization and programmatic AF discovery, balancing the overhead of ensemble or neural AF selection, and the lack of unified theoretical regret bounds for complex, adaptive AF policies. In extremely high dimensions or with heavy-tailed priors, additional research is needed to reconcile practical speed with optimal global search.

7. Notable Empirical Results Across Domains

  • Meta-learned neural AFs reduced median simple regret by 1–2 orders of magnitude relative to standard EI/UCB on function families and transfer tasks (Volpp et al., 2019).
  • Dynamic ensembles and meta-optimized weights halved the simple regret on Branin and real HPO benchmarks relative to static EI/PI/LCB (Merchán et al., 2020, Chen et al., 2022).
  • LLM-synthesized FunBO AFs consistently dominated classical and neural baselines, converging 2–3× faster on out-of-distribution and high-multimodality benchmarks (Aglietti et al., 2024).
  • Likelihood-free, LLM-informed local AFs achieved \sim80% optimality (measured by GAP or regret) in 20 rounds on challenging chemical property optimization, outperforming Laplace-BNN or GP surrogates even with generic features (Chen et al., 15 Dec 2025).
  • Switch-scheduled AFs (EI then PI) realized the best overall regret on the COCO benchmark suite, with explore-then-exploit schedules universally dominating frequent switches or fixed-function baselines (Benjamins et al., 2022).
  • Global mixed-integer solvers (PK-MIQP) found lower minima for acquisition functions in 1–5D and achieved better accuracy on constrained BO than widely used multi-start local optimizers (Xie et al., 2024).

References

This literature demonstrates that acquisition function design, selection, and optimization now constitute an independent—and fast-evolving—discipline at the interface of statistical modeling, learning theory, and real-world experimental design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Acquisition Functions.