DeltaBO: Efficient Transfer Bayesian Optimization
- DeltaBO is a Bayesian optimization algorithm that transfers historical source data by modeling the difference between target and source functions in distinct RKHSs.
- It explicitly quantifies uncertainty using an additive model, leading to provably faster regret rates when the source data is abundant and the discrepancy is smooth.
- The method employs an upper confidence bound rule and demonstrates superior performance in hyperparameter tuning and benchmark applications compared to conventional GP-UCB.
DeltaBO is a Bayesian optimization (BO) algorithm designed for accelerated search on a new (target) black-box function through the transfer of historical data from a related source task. Distinct from prior transfer-Bayesian optimization approaches, DeltaBO performs uncertainty quantification via an explicit modeling of the difference function between target and source tasks, allowing each to belong to different reproducing kernel Hilbert spaces (RKHSs). Under mild regularity assumptions, DeltaBO achieves provably faster regret rates than conventional GP-based BO, particularly when a large sample of source data is available and the source-target discrepancy is smooth or simple.
1. Problem Setting and Notation
Consider a compact input domain . The goal is to maximize an unknown target function , given access to historical observations from a source function and sequential noisy evaluations of .
Let the dataset of source evaluations be
DeltaBO posits an additive model
where represents the difference (or "delta") function. Both and are modeled as independent draws from zero-mean GPs: with positive semi-definite, uniformly bounded kernels , . This implies , (their respective RKHSs) with controlled norms. At each BO iteration , the target evaluation is observed as
The mutual information gain from noisy observations of a GP with kernel is defined as
2. Posterior Inference on Source and Difference Functions
DeltaBO leverages the access to source data and the additive model to efficiently decompose the BO task.
2.1 Posterior on Source Function
Given the source points, the GP regression posterior for is available in closed form:
2.2 Residual Observations and Posterior on
At each BO iteration, upon evaluation , the mean prediction is subtracted, producing a residual
where is zero-mean Gaussian noise, . The residuals serve as unbiased observations of with their own variance structure.
Let denote all previous residuals. Define: then, the GP posterior for is:
with diagonal noise matrix .
3. Acquisition Function and Algorithmic Structure
The posterior mean and variance for the target at round are:
DeltaBO employs an upper confidence bound (UCB) acquisition rule. At each of rounds, with fixed source posterior and updatable residual GP, the next query point is chosen as: where, for confidence level and discrete ,
DeltaBO Algorithm Pseudocode
| Step | Description |
|---|---|
| 1 | Compute source GP posterior from |
| 2 | Initialize -GP mean , variance |
| 3 | For : |
| 3a | Set as above |
| 3b | Select |
| 3c | Query |
| 3d | Compute residual |
| 3e | Update -GP with and variance |
| 4 | Return the best or sample uniformly from |
4. Regret Analysis and Theoretical Guarantees
The cumulative regret after rounds is for . The information gains and reflect the GP information contraction from source and difference processes respectively; is the maximal variance in .
4.1 Main Regret Bound
With high probability (), DeltaBO satisfies: where .
4.2 Asymptotic and Comparative Results
If , , , and , the bound simplifies to
Standard GP-UCB regret scales as . For and (i.e., is “simpler”), DeltaBO provides provable acceleration over conventional BO.
4.3 Sketch of Proof Structure
- A high-probability confidence argument bounds the deviation via .
- Instantaneous regret is upper bounded by .
- Summed variance contributions from and are controlled by information gains and via lemmas A.4, A.6–A.7.
- The Cauchy–Schwarz inequality yields the total cumulative regret rate.
5. Practical Guidance and Experimental Findings
5.1 Kernel Selection
- Source GP (): Typically a Matérn kernel is used when moderate smoothness is expected in .
- Difference GP (): Smoother kernels, such as squared-exponential (SE) or Matérn with long length scale, model as a simple, low-complexity function. Small amplitude for further reduces .
- Noise variances: should be set from replicate noise estimates.
5.2 Choice of
- In continuous domains, a discretization argument is required, increasing logarithmically with discretization size.
- Empirically, a constant (tuned via cross-validation) suffices in many settings.
5.3 Empirical Applications
- Hyperparameter Tuning (AutoML): Examined on UCI Breast-Cancer classification with Gradient-Boosting (11 hyperparameters) and MLP (8 hyperparameters). With , , and using Matérn (for , ) and SE (for ), DeltaBO achieves lower cumulative regret than GP-UCB, GP-EI/PI/TS, Env-GP, and Diff-GP.
- Synthetic Benchmarks: On shifted Gaussians (SE kernels), Bohachevsky functions ( grid), and a ground-truth additive construction, DeltaBO demonstrates rapid regret decay with increasing . Competing baselines do not fully leverage large or require the same kernel for and .
5.4 Recommendations
- Collect a large source sample (), as theoretical regret improves with .
- Model with a smooth kernel and low amplitude to minimize .
- Apply conservative tuning for to maintain valid confidence intervals without resorting to over-exploration.
6. Implications and Context Within Transfer Bayesian Optimization
DeltaBO formalizes a principled and computationally efficient framework to combine existing source GP data with sequential target evaluations, explicitly quantifying the informativeness and complexity of both the source and difference functions. The explicit dependence of regret on and enables sharp guidance on when and how transfer learning is beneficial in BO. Empirical results indicate that DeltaBO consistently outperforms established classical and transfer-BO methods, particularly when source-target alignment is strong, the source dataset is considerably larger than the target, and the difference function is well-modeled by a simple GP.
This suggests that in practical Bayesian optimization regimes where related source data is abundant and the transfer gap is small in complexity, DeltaBO should be favored for provably rapid convergence and effective knowledge transfer.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free