DeltaBO: Efficient Transfer Bayesian Optimization

Updated 9 November 2025

DeltaBO is a Bayesian optimization algorithm that transfers historical source data by modeling the difference between target and source functions in distinct RKHSs.
It explicitly quantifies uncertainty using an additive model, leading to provably faster regret rates when the source data is abundant and the discrepancy is smooth.
The method employs an upper confidence bound rule and demonstrates superior performance in hyperparameter tuning and benchmark applications compared to conventional GP-UCB.

DeltaBO is a Bayesian optimization (BO) algorithm designed for accelerated search on a new (target) black-box function through the transfer of historical data from a related source task. Distinct from prior transfer-Bayesian optimization approaches, DeltaBO performs uncertainty quantification via an explicit modeling of the difference function between target and source tasks, allowing each to belong to different reproducing kernel Hilbert spaces (RKHSs). Under mild regularity assumptions, DeltaBO achieves provably faster regret rates than conventional GP-based BO, particularly when a large sample of source data is available and the source-target discrepancy is smooth or simple.

1. Problem Setting and Notation

Consider a compact input domain $\mathcal D \subset \mathbb R^d$ . The goal is to maximize an unknown target function $f:\mathcal D \to \mathbb R$ , given access to $N$ historical observations from a source function $g:\mathcal D \to \mathbb R$ and sequential noisy evaluations of $f$ .

Let the dataset of source evaluations be

$\mathcal S^{(0)} = \left\{ \left(x_i^{(0)}, y_i^{(0)}\right) \right\}_{i=1}^N,\quad y_i^{(0)} = g(x_i^{(0)}) + \varepsilon_i^{(0)},\quad \varepsilon_i^{(0)} \sim \mathcal N(0, \sigma_0^2).$

DeltaBO posits an additive model

$f(x) = g(x) + \delta(x),$

where $\delta(x) = f(x) - g(x)$ represents the difference (or "delta") function. Both $g$ and $\delta$ are modeled as independent draws from zero-mean GPs: $g \sim \mathcal{GP}(0, k_g), \quad \delta \sim \mathcal{GP}(0, k_\delta),$ with positive semi-definite, uniformly bounded kernels $k_g(\cdot, \cdot)$ , $k_\delta(\cdot, \cdot)$ . This implies $g \in \mathcal H_g$ , $\delta \in \mathcal H_\delta$ (their respective RKHSs) with controlled norms. At each BO iteration $t$ , the target evaluation is observed as

$y_t = f(x_t) + \varepsilon_t,\quad \varepsilon_t \sim \mathcal N(0, \sigma^2).$

The mutual information gain from $M$ noisy observations of a GP with kernel $k_f$ is defined as

$\gamma_{f, M} = \max_{A \subset \mathcal D, |A|=M} I(y_A; f_A).$

2. Posterior Inference on Source and Difference Functions

DeltaBO leverages the access to source data and the additive model to efficiently decompose the BO task.

2.1 Posterior on Source Function $g$

Given the $N$ source points, the GP regression posterior for $g$ is available in closed form: $K_{g,N} = [k_g(x_i^{(0)}, x_j^{(0)})]_{i,j=1}^N, \quad k_{g,N}(x) = [k_g(x_i^{(0)}, x)]_{i=1}^N,$

$\mu_{g,N}(x) = k_{g,N}(x)^\top \left( K_{g,N} + \sigma_0^2 I_N \right)^{-1} y^{(0)},$

$\sigma^2_{g,N}(x) = k_g(x, x) - k_{g,N}(x)^\top \left( K_{g,N} + \sigma_0^2 I_N \right)^{-1} k_{g,N}(x).$

2.2 Residual Observations and Posterior on $\delta$

At each BO iteration, upon evaluation $y_t = f(x_t) + \varepsilon_t = g(x_t) + \delta(x_t) + \varepsilon_t$ , the mean prediction $\mu_{g,N}(x_t)$ is subtracted, producing a residual

$\tilde y_t = y_t - \mu_{g,N}(x_t) = \delta(x_t) + \eta_t,$

where $\eta_t$ is zero-mean Gaussian noise, $\eta_t \sim \mathcal N(0, \sigma^2_{g,N}(x_t) + \sigma^2)$ . The residuals serve as unbiased observations of $\delta(x)$ with their own variance structure.

Let $(x_1, \tilde y_1), ..., (x_{t-1}, \tilde y_{t-1})$ denote all previous residuals. Define: $K_{\delta, t-1} = [k_\delta(x_i, x_j)]_{i,j=1}^{t-1},\quad k_{\delta, t-1}(x) = [k_\delta(x_i, x)]_{i=1}^{t-1},$ then, the GP posterior for $\delta$ is: $\mu_{\delta, t-1}(x) = k_{\delta, t-1}(x)^\top \left( K_{\delta, t-1} + \Sigma_{t-1} \right)^{-1} \tilde y_{1:t-1},$

$\sigma^2_{\delta, t-1}(x) = k_\delta(x, x) - k_{\delta, t-1}(x)^\top \left( K_{\delta, t-1} + \Sigma_{t-1} \right)^{-1} k_{\delta, t-1}(x),$

with diagonal noise matrix $\Sigma_{t-1} = \operatorname{diag}(\sigma^2_{g,N}(x_1) + \sigma^2, ..., \sigma^2_{g,N}(x_{t-1}) + \sigma^2)$ .

3. Acquisition Function and Algorithmic Structure

The posterior mean and variance for the target $f$ at round $t$ are: $\mu_t(x) = \mu_{g,N}(x) + \mu_{\delta, t-1}(x), \quad \sigma_t^2(x) = \sigma^2_{g,N}(x) + \sigma^2_{\delta, t-1}(x).$

DeltaBO employs an upper confidence bound (UCB) acquisition rule. At each of $T$ rounds, with fixed source posterior and updatable residual GP, the next query point is chosen as: $x_t = \arg\max_{x \in \mathcal D} \left\{ \mu_t(x) + \sqrt{\beta_t}\sigma_t(x) \right\},$ where, for confidence level $1-\rho$ and discrete $\mathcal D$ ,

$\beta_t = 2 \log(|\mathcal D| \pi^2 t^2/(6\rho)).$

DeltaBO Algorithm Pseudocode

Step	Description
1	Compute source GP posterior $(\mu_{g,N}, \sigma^2_{g,N})$ from $\mathcal S^{(0)}$
2	Initialize $\delta$ -GP mean $\mu_{\delta,0}(x) = 0$ , variance $\sigma^2_{\delta,0}(x) = \sigma^2_{g,N}(x) + \sigma^2$
3	For $t = 1, ..., T$ :
3a	Set $\beta_t$ as above
3b	Select $x_t = \arg\max_{x \in \mathcal D} \{ \mu_{g,N}(x) + \mu_{\delta, t-1}(x) + \sqrt{\beta_t}\sqrt{\sigma^2_{g,N}(x) + \sigma^2_{\delta, t-1}(x)} \}$
3c	Query $y_t = f(x_t) + \varepsilon_t$
3d	Compute residual $\tilde y_t = y_t - \mu_{g,N}(x_t)$
3e	Update $\delta$ -GP with $(x_t, \tilde y_t)$ and variance $\sigma^2_{g,N}(x_t) + \sigma^2$
4	Return the best $x$ or sample uniformly from $\{x_1, ..., x_T\}$

4. Regret Analysis and Theoretical Guarantees

The cumulative regret after $T$ rounds is $R_T = \sum_{t=1}^T [f(x^*) - f(x_t)]$ for $x^* = \arg\max_{x \in \mathcal D} f(x)$ . The information gains $\gamma_{g, N}$ and $\gamma_{\delta, T}$ reflect the GP information contraction from source and difference processes respectively; $\tau^2 = \sup_x k_{\delta}(x,x)$ is the maximal variance in $\delta$ .

4.1 Main Regret Bound

With high probability ( $\ge 1 - \rho$ ), DeltaBO satisfies: $R_T \le \sqrt{8 T \beta_T \left( \frac{T \gamma_{g,N}\, \sigma_0^2}{N-2\gamma_{g,N} + C_2 \gamma_{\delta,T} \left( \frac{2\gamma_{g,N} \sigma_0^2}{N-2\gamma_{g,N}+\sigma^2} \right)} \right)},$ where $C_2 = (\tau^2/\sigma^2) / \log(1+\tau^2/\sigma^2) \le 1+\tau^2/\sigma^2$ .

4.2 Asymptotic and Comparative Results

If $\gamma_{g,N} = o(N)$ , $\gamma_{\delta,T}=O(T)$ , $\tau^2 = O(\sigma^2)$ , and $\gamma_{g,N}/N = O(\gamma_{\delta,T}/T)$ , the bound simplifies to

$R_T = O\Bigl(\sqrt{T \beta_T \gamma_{\delta,T}}\Bigr) = \widetilde O\left(\sqrt{T(T/N + \gamma_{\delta,T})}\right).$

Standard GP-UCB regret scales as $O(\sqrt{T \beta_T \gamma_{f,T}})$ . For $N \gg T$ and $\gamma_{\delta,T} \ll \gamma_{f,T}$ (i.e., $\delta$ is “simpler”), DeltaBO provides provable acceleration over conventional BO.

4.3 Sketch of Proof Structure

A high-probability confidence argument bounds the deviation $|f(x) - \mu_{g,N}(x) - \mu_{\delta, t-1}(x)|$ via $\sqrt{\beta_t} \sigma_t(x)$ .
Instantaneous regret is upper bounded by $2\sqrt{\beta_t} \sigma_t(x_t)$ .
Summed variance contributions from $\sigma^2_{g,N}(x_t)$ and $\sigma^2_{\delta, t-1}(x_t)$ are controlled by information gains $\gamma_{g,N}$ and $\gamma_{\delta,T}$ via lemmas A.4, A.6–A.7.
The Cauchy–Schwarz inequality yields the total cumulative regret rate.

5. Practical Guidance and Experimental Findings

5.1 Kernel Selection

Source GP ( $g$ ): Typically a Matérn kernel is used when moderate smoothness is expected in $g$ .
Difference GP ( $\delta$ ): Smoother kernels, such as squared-exponential (SE) or Matérn with long length scale, model $\delta$ as a simple, low-complexity function. Small amplitude $\tau^2$ for $k_\delta$ further reduces $\gamma_{\delta,T}$ .
Noise variances: $\sigma_0, \sigma$ should be set from replicate noise estimates.

5.2 Choice of $\beta_t$

In continuous domains, a discretization argument is required, increasing $\beta_t$ logarithmically with discretization size.
Empirically, a constant $\beta_t$ (tuned via cross-validation) suffices in many settings.

5.3 Empirical Applications

Hyperparameter Tuning (AutoML): Examined on UCI Breast-Cancer classification with Gradient-Boosting (11 hyperparameters) and MLP (8 hyperparameters). With $N \approx 90$ , $T=30$ , and using Matérn (for $g$ , $f$ ) and SE (for $\delta$ ), DeltaBO achieves lower cumulative regret than GP-UCB, GP-EI/PI/TS, Env-GP, and Diff-GP.
Synthetic Benchmarks: On shifted Gaussians (SE kernels), Bohachevsky functions ( $120 \times 120$ grid), and a ground-truth additive construction, DeltaBO demonstrates rapid regret decay with increasing $N$ . Competing baselines do not fully leverage large $N$ or require the same kernel for $g$ and $\delta$ .

5.4 Recommendations

Collect a large source sample ( $N \gg T$ ), as theoretical regret improves with $N$ .
Model $\delta$ with a smooth kernel and low amplitude to minimize $\gamma_{\delta}$ .
Apply conservative tuning for $\beta_t$ to maintain valid confidence intervals without resorting to over-exploration.

6. Implications and Context Within Transfer Bayesian Optimization

DeltaBO formalizes a principled and computationally efficient framework to combine existing source GP data with sequential target evaluations, explicitly quantifying the informativeness and complexity of both the source and difference functions. The explicit dependence of regret on $N$ and $\gamma_{\delta}$ enables sharp guidance on when and how transfer learning is beneficial in BO. Empirical results indicate that DeltaBO consistently outperforms established classical and transfer-BO methods, particularly when source-target alignment is strong, the source dataset is considerably larger than the target, and the difference function is well-modeled by a simple GP.

This suggests that in practical Bayesian optimization regimes where related source data is abundant and the transfer gap is small in complexity, DeltaBO should be favored for provably rapid convergence and effective knowledge transfer.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to DeltaBO Algorithm.