Acquisition Functions in Bayesian Optimization
- Acquisition functions are mappings that assign scores based on a surrogate's posterior means and variances, guiding the balance between exploration and exploitation in Bayesian optimization.
- They determine the next query point by trading off regions of high uncertainty with areas known to yield promising improvements using methods like EI, PI, and UCB.
- Recent advances include dynamic ensembles, meta-learned neural acquisition functions, and information-theoretic strategies that enhance optimization efficiency and scalability.
Acquisition functions are central elements in Bayesian optimization (BO), active learning, and Bayesian quadrature: they prescribe where to query next by formalizing the trade-off between exploration (sampling where the surrogate model is uncertain) and exploitation (refining regions known to be promising). Mathematically, an acquisition function is any mapping that assigns to each candidate (given current data) a real score, typically derived from the posterior mean and variance of a probabilistic surrogate (e.g., Gaussian process). Their correct design, efficient optimization, adaptation to task structure, and even meta-learning or programmatic synthesis remain active research frontiers. Recent years have seen the emergence of dynamic ensembles, meta-learned neural acquisitions, information-theoretic criteria, high-dimensional variants, and advanced strategies for acquisition maximization. This article provides a comprehensive, rigorously sourced overview.
1. Classical Acquisition Functions: Definitions and Taxonomy
Canonical acquisition functions for Gaussian process–based BO are parametrized by the posterior mean and variance computed on a dataset . The most widely used forms include:
- Upper Confidence Bound (UCB):
where controls the exploration–exploitation trade-off (Iwata, 2021).
- Probability of Improvement (PI):
with , (jitter), and the standard normal CDF (Iwata, 2021).
- Expected Improvement (EI):
where and is the standard normal PDF (Wilson et al., 2018).
- Mutual Information (MI):
For Gaussian likelihood , this admits the form (Iwata, 2021).
- Parametric Generalizations: A continuous one-parameter family recovers PI and EI as special cases:
with recovers PI, gives EI. For , closed-form expressions involve confluent hypergeometric functions (Kanazawa, 2021).
- Uncertainty-based AL (Active Learning) Acquisitions: In classification, one uses predictive entropy (MaxEnt), mean class probability standard deviation (MeanSTD), or mutual information with weights (BALD) (Dossou, 2024).
Further developments include information-theoretic criteria such as Max-value Entropy Search (MES) (Wang et al., 15 Feb 2025), and acquisition Thompson sampling (ATS) for batch queries, which induces multiple acquisition functions by sampling the surrogate’s hyperparameters (Palma et al., 2019).
2. Advanced Acquisition Optimization: Theory and Practice
The maximization of acquisition functions per iteration is itself a nonconvex global optimization challenge, and is critical for regret guarantees (Wilson et al., 2018, Kim et al., 2019, Zhao et al., 2023, Xie et al., 2024).
Key principles:
- Bayes Decision Rule: Choosing constitutes the myopic Bayes action.
- Gradient-Based Maximization: Most acquisition functions can be rewritten as expectations (or integrals) over Gaussians, facilitating reparameterization:
This enables unbiased stochastic gradient estimation and efficient use of optimizers such as L-BFGS or Adam (especially in parallel/batch settings) (Wilson et al., 2017, Wilson et al., 2018).
- Submodularity and Greedy Selection: For batch (q-point) selections, many acquisition functions (PI, EI, UCB) are submodular when formulated as set utilities; sequential greedy maximization delivers a approximation to the global joint optimum (Wilson et al., 2018).
- Local vs. Global Maximizers: Multi-start local optimization with a moderate number of restarts (–100) yields negligible additional regret compared to global solvers, providing strong empirical and theoretical justification for this practice (Kim et al., 2019).
- Piecewise-Linear Kernel MIP: Mixed-integer programming with piecewise-linear kernel surrogates enables certifiably global AF maximization and regret bounds, outperforming multi-start heuristics on multimodal landscapes at moderate scale (Xie et al., 2024).
- Initialization in High Dimension: Heuristic initializations (e.g., via CMA-ES, GA, or by leveraging historical data) can dramatically improve AF maximization in high-dimensions compared to random restarts. Poor initializations lead to pathological over-exploration (Zhao et al., 2023).
3. Learning and Adapting Acquisition Functions
While traditional acquisition functions are handcrafted, a spectrum of contemporary research seeks data-driven, transferable, or adaptive design:
- Ensembles and Meta-Adaptation: Weighted or dynamically combined ensembles of EI, PI, LCB/UCB, with generator functions (random, cycling, meta-optimized weights), enhance robustness across tasks. Meta-optimization of AF weights as an outer-loop BO problem consistently reduces regret (Merchán et al., 2020, Chen et al., 2022).
- Switching Schedules: Explicitly switching between explorative (EI) and exploitative (PI or MSP) AFs, either on a preset schedule (e.g., first 25% steps EI, rest PI) or adaptively (switch on local convergence), delivers strong performance across diverse problems (Benjamins et al., 2022, Wang et al., 15 Feb 2025).
- Meta-Learned Neural AFs: Training neural acquisition policies via reinforcement learning over families of tasks yields parameterized AFs (e.g., ) that adapt to structural regularities, outperforming fixed EI/UCB in transfer and few-shot scenarios (Volpp et al., 2019, Iwata, 2021).
- LLM-Guided and Programmatic Synthesis: Recent methods employ LLMs and symbolic program search to synthesize novel, interpretable AFs with empirically superior convergence (FunBO). These AFs blend and extend classical forms (EI, UCB, PI) using higher-order rational expressions, CDF shifts, PDF powers, and empirically tuned reweightings; they generalize robustly beyond their training distribution (Aglietti et al., 2024).
| Approach | Adaptivity | Relevant References |
|---|---|---|
| Weighted Ensemble | Static/Dynamic | (Merchán et al., 2020, Chen et al., 2022) |
| Meta-Learned Neural AF | Task-Adaptive | (Volpp et al., 2019, Iwata, 2021) |
| Switch Schedule | Stage-Adaptive | (Benjamins et al., 2022, Wang et al., 15 Feb 2025) |
| LLM/Program Search | Task-Specific | (Aglietti et al., 2024) |
4. Multi-Objective, Likelihood-Free, and Domain-Specific AFs
Acquisition function innovation extends into multi-objective, likelihood-free, and domain-specific regimes:
- Dynamic Multi-objective Ensembles: At each BO iteration, DMEA identifies a triple of best-performing acquisition functions (from a pool of EI, PI, LCBs) based on penalties reflecting their historical success. Batch candidates are then selected by Pareto-optimal evolutionary search and layered preference scores, balancing diversity and expected utility (Chen et al., 2022).
- Active Learning and Uncertainty: In deep active learning (e.g., for medical imaging), uncertainty measures such as BALD (predictive information gain), maximal entropy, and mean STD are key. Empirical studies confirm BALD’s stability but reveal all such AFs can be myopic under heavy class imbalance (Dossou, 2024).
- Likelihood-Free AFs: In structured domains (e.g., molecular optimization), density-ratio classifiers replace surrogate-based AFs. Tree-based partitioning with local acquisition functions and LLM/chemistry foundation model priors enables scalable, sample-efficient search over vast, structured spaces (Chen et al., 15 Dec 2025).
5. Information-Theoretic and Bayesian Quadrature Acquisitions
Beyond improvement and confidence-based AFs, information-theoretic approaches play a central role:
- Mutual Information (MI) and Max-value Entropy Search (MES): GP-MI and MES directly optimize for expected information gain about the location or value of the maximum; they are especially effective when exploration of epistemic uncertainty is crucial (Iwata, 2021, Wang et al., 15 Feb 2025). Adaptive switching between exploitation (MSP) and exploration (MES) phases yields superior performance on high-fidelity, costly problems (Wang et al., 15 Feb 2025).
- Bayesian Quadrature (BQ) AFs: In model evidence estimation, one-step or prospective AFs maximize pointwise variance (PUQ), aim to reduce posterior or evidence variance contributions (PVC, PLUR, PEUR), with closed-form or efficiently estimated objectives. Empirical benchmarks show that PEUR is generally the most sample-efficient for evidence estimation, while PLUR excels in capturing secondary modes (Song et al., 10 Oct 2025).
6. Empirical Insights, Recommendations, and Limitations
Experimental comparisons consistently show:
- No single acquisition function is universally optimal. Ensemble, adaptive, or meta-learned AFs consistently outperform static baselines across benchmarks (Merchán et al., 2020, Chen et al., 2022, Aglietti et al., 2024).
- Well-tuned AFs and their maximization dominate asymptotic regret. Poor AF maximization (e.g., random or single-start local methods in high dimensions) can erase theoretical guarantees of BO (Kim et al., 2019, Zhao et al., 2023).
- Switching schemes and meta-adaptation align AF mode to problem phase (exploration when the surrogate is rough, exploitation on local refinement) (Benjamins et al., 2022, Wang et al., 15 Feb 2025).
- Tree-structured local AFs and surrogate-free classifiers significantly improve scalability in combinatorial/structured domains (Chen et al., 15 Dec 2025).
Open challenges include the high computational cost of meta-optimization and programmatic AF discovery, balancing the overhead of ensemble or neural AF selection, and the lack of unified theoretical regret bounds for complex, adaptive AF policies. In extremely high dimensions or with heavy-tailed priors, additional research is needed to reconcile practical speed with optimal global search.
7. Notable Empirical Results Across Domains
- Meta-learned neural AFs reduced median simple regret by 1–2 orders of magnitude relative to standard EI/UCB on function families and transfer tasks (Volpp et al., 2019).
- Dynamic ensembles and meta-optimized weights halved the simple regret on Branin and real HPO benchmarks relative to static EI/PI/LCB (Merchán et al., 2020, Chen et al., 2022).
- LLM-synthesized FunBO AFs consistently dominated classical and neural baselines, converging 2–3× faster on out-of-distribution and high-multimodality benchmarks (Aglietti et al., 2024).
- Likelihood-free, LLM-informed local AFs achieved 80% optimality (measured by GAP or regret) in 20 rounds on challenging chemical property optimization, outperforming Laplace-BNN or GP surrogates even with generic features (Chen et al., 15 Dec 2025).
- Switch-scheduled AFs (EI then PI) realized the best overall regret on the COCO benchmark suite, with explore-then-exploit schedules universally dominating frequent switches or fixed-function baselines (Benjamins et al., 2022).
- Global mixed-integer solvers (PK-MIQP) found lower minima for acquisition functions in 1–5D and achieved better accuracy on constrained BO than widely used multi-start local optimizers (Xie et al., 2024).
References
- End-to-End Learning of Deep Kernel Acquisition Functions for Bayesian Optimization (Iwata, 2021)
- Dynamic Multi-objective Ensemble of Acquisition Functions in Batch Bayesian Optimization (Chen et al., 2022)
- A Study of Acquisition Functions for Medical Imaging Deep Active Learning (Dossou, 2024)
- One-parameter family of acquisition functions for efficient global optimization (Kanazawa, 2021)
- Maximizing acquisition functions for Bayesian optimization (Wilson et al., 2018)
- On Local Optimizers of Acquisition Functions in Bayesian Optimization (Kim et al., 2019)
- Optimizing Bayesian acquisition functions in Gaussian Processes (Pawar et al., 2021)
- PI is back! Switching Acquisition Functions in Bayesian Optimization (Benjamins et al., 2022)
- Inverse Bayesian Optimization: Learning Human Acquisition Functions in an Exploration vs Exploitation Search Task (Sandholtz et al., 2021)
- Towards Automatic Bayesian Optimization: A first step involving acquisition functions (Merchán et al., 2020)
- Sampling Acquisition Functions for Batch Bayesian Optimization (Palma et al., 2019)
- An adaptive switch strategy for acquisition functions in Bayesian optimization of wind farm layout (Wang et al., 15 Feb 2025)
- Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization (Volpp et al., 2019)
- The reparameterization trick for acquisition functions (Wilson et al., 2017)
- Unleashing the Potential of Acquisition Functions in High-Dimensional Bayesian Optimization (Zhao et al., 2023)
- Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation (Xie et al., 2024)
- Bayesian Model Inference using Bayesian Quadrature: the Art of Acquisition Functions and Beyond (Song et al., 10 Oct 2025)
- Bayesian Optimization for Enhanced LLMs: Optimizing Acquisition Functions (Bao et al., 22 May 2025)
- Informing Acquisition Functions via Foundation Models for Molecular Discovery (Chen et al., 15 Dec 2025)
- FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch (Aglietti et al., 2024)
This literature demonstrates that acquisition function design, selection, and optimization now constitute an independent—and fast-evolving—discipline at the interface of statistical modeling, learning theory, and real-world experimental design.