Bayesian Optimization Variants
- Bayesian optimization variants are advanced methods that extend the classic GP framework to handle high-dimensionality, non-Gaussian noise, constraints, and unbounded domains.
- They employ enhanced surrogate models, innovative acquisition functions, and tailored kernel structures to improve sample efficiency, scalability, and robustness.
- These methods integrate domain-specific priors, gradient information, and parallel execution strategies to provide practical, efficient solutions with provable convergence guarantees.
Bayesian optimisation (BO) variants constitute a diverse family of methodologies extending the classical GP-based BO framework to address challenges such as high-dimensionality, constraints (including decoupled or black-box evaluation), heteroscedastic and non-Gaussian noise, mixed-variable and variable-size spaces, unbounded or unknown domains, expert priors, limited computational resources, batch/asynchronous parallelism, and formal verification requirements. These approaches develop new surrogate models, acquisition strategies, kernel structures, or execution paradigms to improve sample efficiency, robustness, scalability, or applicability to more complex real-world optimisation settings.
1. Variants Leveraging Surrogate Model Enhancements
Several BO variants extend the Gaussian process surrogate to handle fundamental modelling challenges:
a. Heteroscedastic and Non-stationary Models:
- Heteroscedastic Treed Bayesian Optimisation (HTBO) partitions the domain using a variance-minimising CART structure, assigning an independent GP with leaf-specific hyperparameters (kernel + noise) to each region and sharing statistical strength up the ancestor chain via a weighted pseudo-likelihood, enabling local adaptation to non-stationarity and noise heterogeneity (Assael et al., 2014).
- Aleatoric Uncertainty-Robust BO fits a two-level GP hierarchy to infer input-dependent noise via maximum-likelihood heteroscedastic GPs (MLHGP) and incorporates the noise predictions into acquisition via noise-sensitive expected improvement (HAEI) or scalarised penalisation (ANPEI), steering suggestions toward robust, low-variance optima (Griffiths et al., 2019).
- Two-GP Quantile/Expectile Surrogate models the risk-averse (quantile or expectile) objective alongside a GP for the local scale parameter, leveraging variational inference and asymmetric-likelihoods (pinball or asymmetric Laplace/Gaussian), and enabling batch/generative acquisitions optimising for quantile/expectile reward directly in heteroscedastic, non-Gaussian environments (Picheny et al., 2020).
b. Mixed and Variable-Size Search Spaces:
- Latent Variable Surrogates for Mixed Variables map discrete/categorical variables to a continuous latent embedding, enabling the use of standard kernels and acquisition functions over an unconstrained continuous space, with pre-image recovery at proposal time; augmented Lagrangian (ALV) variants enforce compatibility constraints during acquisition optimisation for strict discrete-level adherence (Cuesta-Ramirez et al., 2021).
- Variable-Size Design-Space Kernels (VSDK): kernel structures composed to handle samples with overlapping or differing active dimensions, either via subproblem-wise (SPW) or dimensional-variable-wise (DVW) decompositions, enable a single GP to fuse information and direct infill across the union of all possible variable-activation patterns (Pelamatti et al., 2020).
c. Dimensionality Reduction Metamodels:
- Probabilistic Partial Least Squares-BO (PPLS-BO): introduces a Bayesian generative model linking high-dimensional input and output through low-dimensional latent variables, alternates variational EM for PPLS with GP regression in latent space, and MC integration over posterior uncertainty to deliver robust performance and faster convergence in high-dimensional engineering design contexts (Archbold et al., 2 Jan 2025).
d. Incorporation of Domain Knowledge:
- Space-Warped BO: uses an expert-defined density over the likely optimizer location to warp the search space (cdf transforms), injecting this prior into the GP kernel, yielding acquisition-agnostic bias toward regions of high prior probability, especially beneficial in cold-start and large-scale search regimes (Ramachandran et al., 2020).
2. Acquisition Function Extensions and Asynchronous/Batch Designs
Modifications to acquisition functions and parallel optimisation execution include:
a. Local Penalisation for Asynchronous/Batch BO:
- PLAyBOOK: generalises local penalisation to asynchronous batch settings, carving exclusion zones around busy points via Lipschitz-informed, high-confidence radii, and penalising acquisition in their neighbourhoods (using either hard or smoothed decays), supporting robust parallelisation and often delivering both better hardware utilization and superior sample efficiency (Alvi et al., 2019).
- Swarm BO (SMBO & SMBO-Dec): extends batch/asynchronous ideas to swarm robotics, using prior performance maps as a GP mean surface, local penalisation for batch diversity, and fully decentralised per-fault group GPs to enable rapid distributed adaptation to environmental or agent perturbations (Bossens et al., 2020).
b. Robust/Noise-Averse and Quantile-Focused Acquisitions:
- HAEI/ANPEI: integrate noise-predictive penalisation terms directly into EI to prioritise robust points under aleatoric noise (Griffiths et al., 2019).
- Quantile/Expectile Batch Strategies: optimise mutual information about the (quantile/expectile) global maximum (Q-GIBBON) or use Thompson-sampling with pseudo-samples from the quantile GP in batch mode (Picheny et al., 2020).
c. Constraint-Handling and Decoupled Black-Box Constraints:
- Constrained Knowledge Gradient (cKG): incorporates probability of feasibility, models joint uncertainty over constraints and objective in the KG framework, and, in its decoupled extension (dcKG), allows for selective evaluation of only those constraints or objectives which have maximum marginal value-for-cost (expected utility per evaluation cost), focusing budget on binding constraints or high-value sources (Lin et al., 19 Dec 2025, Ungredda et al., 2021).
- Hybrid BO+SMT Formalism: combines standard GP-based BO with SMT-based verification to guarantee correctness, stability, and feasibility of returned solutions, integrating BO as an efficient search subroutine within a logical proof engine (Brauße et al., 2021).
d. Hyperparameter Uncertainty and Marginalisation:
- Fully Bayesian BO (FBBO): propagates GP hyperparameter uncertainty (lengthscales, amplitudes, noise) throughout the entire BO loop using MCMC or variational inference. Empirically, FBBO with EI and ARD kernels yields improved performance in low-noise and complex landscapes, while for UCB or isotropic kernels, marginal improvements over MAP are less pronounced. Over-exploration can occur when further marginalising under UCB (Ath et al., 2021).
e. Handling Common Random Numbers and CRN-Optimized KG:
- KG-CRN: models the dependence of output on both design and random seed; the acquisition function jointly optimises the next input and seed, trading off between CRN reuse (to reduce local variance through comparisons under shared randomness) and global exploration with new seeds, and achieves superior sample efficiency in stochastic simulators (Pearce et al., 2019).
3. Variants for Unbounded, Unknown, or Multi-Scale Search Spaces
Handling scenarios where the true optimizer may lie outside a pre-specified region, or where the region is not known, several families emerge:
| Variant | Expansion Principle | Acquisition Regularization | Key Property |
|---|---|---|---|
| Volume-Doubling (EI-V) | Isotropic box doubling | None | Simple box expansion heuristic |
| EI-Q/EI-H (regularized) | None | Quadratic/hinge penalizer on EI | Acquisition decays at infinity |
| HuBO, HD-HuBO (Tran-The et al., 2020) | Controlled hyperharmonic | None (search space expansion only) | Sublinear regret with unbounded expansion |
| Bayesian Multi-Scale Optimistic Opt. | Shrinking relevant set | None | Dyadic refinement with UCB pruning |
- Unbounded BO with Volume Expansion or Regularisation: increases support either by box doubling (EI-V) or regularizing the acquisition (EI-Q, EI-H) so that it decays to zero outside an initially plausible region. Regularization integrates directly into the GP prior mean or acquisition target, whereas volume-doubling physically expands the search box (Shahriari et al., 2015).
- HuBO/HD-HuBO: expands the search cube according to a hyperharmonic progression, recenters toward the best incumbent, and restricts subregion search via random subcubes in high dimension, admitting provable sublinear regret (for appropriate expansion rates) even when the optimum lies arbitrarily far outside the initial domain (Tran-The et al., 2020).
- Bayesian Multi-Scale Optimistic Optimisation (BMSOO): replaces aggressive global acquisition optimisation with an adaptive dyadic grid refinement scheme leveraging GP-UCB confidence bounds, systematically eliminating regions via multi-scale bounding and achieving exponential simple-regret decay in deterministic, low-dimensional settings (Wang et al., 2014).
4. Extensions for Mixed, Variable, and Conditional Variable Spaces
- Latent Variable Embeddings: continuous latent “images” are assigned to discrete values, allowing surrogates to operate in an augmented continuous space. Recovery of the actual discrete values is achieved via nearest pre-image or post-hoc maximization, with augmented Lagrangian constraints enforcing strict compatibility if so desired. Global and local dual-updating strategies manage the augmented Lagrangian multipliers (Cuesta-Ramirez et al., 2021).
- Variable-Size Spaces (VSDK): kernel design enables joint GP modelling over collections of subproblems of differing dimension or type, facilitating information transfer and improving convergence compared to “budget splitting” strategies that treat each subproblem independently and rely on discard/allocate search (Pelamatti et al., 2020).
5. Specialised Parallel, Budgeted, and Gradient-Enhanced Optimisation
- Gradient-Enhanced Bayesian Optimisation: constructs a joint GP over both function values and gradients (with analytical derivative covariance structure), uses local active-set selection for computational efficiency, and incorporates a probabilistic trust region (on GP posterior variance) to efficiently constrain acquisition function optimization, matching or outperforming quasi-Newton optimisers—especially in the presence of noisy gradients (Marchildon et al., 12 Apr 2025).
- SMBO/SMBO-Dec in Swarm Applications: leverages discrete behaviour archives (MAP-Elites) as structured priors, uses decentralised group-specific GPs, and applies local penalisation batch acquisition, enabling asynchronous adaptation in large-scale, rapidly perturbed robot swarms (Bossens et al., 2020).
6. Theoretical Guarantees and Practical Guidelines
- Convergence and Consistency:
- HD-HuBO: sublinear regret scaling as under controlled growth (Tran-The et al., 2020).
- BMSOO: exponential simple regret decay, bounded cumulative regret (Wang et al., 2014).
- dcKG: consistency for decoupled constraints under finite candidate sets (Lin et al., 19 Dec 2025).
- cKG: asymptotic optimality under infinite-budget and finite domains (Ungredda et al., 2021).
- BO+SMT: finite termination with explicit, verifiable safety/stability certificates (Brauße et al., 2021).
- Empirical Positioning and Guidelines:
- Use latent variable or VSDK kernels for mixed/variable/conditional spaces.
- Apply asynchronous batch penalisation (PLAyBOOK) for high-throughput parallel environments.
- Adopt fully Bayesian (MCMC-marginalised) hyperparameters for exploitative acquisitions in moderately noisy, multi-modal tasks.
- For safety-critical or formally specified objectives, use BO+SMT to guarantee certified solutions.
- For risk-aware objectives, deploy quantile/expectile-optimising surrogates with entropy or TS-based batch selection.
7. Open Questions and Ongoing Developments
- Scalability and High-dimensional Optimisation: PPLS-BO and other multi-view dimension reduction methods address the scalability limitations of GPs but require further validation in noisy, highly correlated design spaces (Archbold et al., 2 Jan 2025).
- Automated Prior Learning/Adaptation: While space-warping boosts early-stage exploitation of domain knowledge, principled methods for learning or adapting such priors remain an open frontier (Ramachandran et al., 2020).
- Integration of Gradient and Constraint Information: Extensions of gradient-enhanced BO to the constrained or mixed-variable regime are in development (Marchildon et al., 12 Apr 2025).
- Acquisition Function Robustness: The development of acquisition functions robust to heavy-tailed or multi-modal hyperparameter posteriors, nonstationary surrogates, and complex non-Gaussian/noisy observations remains an area of theoretical and algorithmic focus (Ath et al., 2021).
These methodological variants collectively endow Bayesian optimisation with robustness, flexibility, and practical power across an expanded range of optimisation scenarios, enabling safe, scalable, and sample-efficient black-box search in both theoretical and applied domains.