Cumulative Regret Bounds in GP Bandits

Updated 30 June 2025

Cumulative Regret Bounds are performance metrics that quantify the total excess loss in sequential decision algorithms compared to an optimal strategy.
They establish both lower and upper performance limits in GP bandit optimization, highlighting the role of kernel smoothness and dimensionality.
Refined analyses using local information gain have nearly closed the gap between theoretical bounds and practical performance in algorithms like GP-UCB.

Cumulative regret bounds quantify the performance limitation and achievable efficiency of sequential decision algorithms—such as those used in bandit optimization and reinforcement learning—by measuring the total excess loss suffered over time compared to an optimal strategy. In the context of Gaussian process (GP) bandit optimization, cumulative regret bounds are central to both the theoretical understanding and practical design of algorithms for black-box and Bayesian optimization.

1. Mathematical Foundations of Cumulative Regret Bounds

Definition of Cumulative Regret

In sequential optimization, for an unknown function $f$ defined on a compact domain $D \subseteq \mathbb{R}^d$ , and a sequence of queries $\{x_t\}_{t=1}^T$ , the cumulative regret up to round $T$ is: $R_T = \sum_{t=1}^T [f(x^*) - f(x_t)]$ where $x^* = \arg\max_{x \in D} f(x)$ . This quantity summarizes the total penalty for not always choosing the global maximizer.

Cumulative regret is often considered in expectation, $\mathbb{E}[R_T]$ , where the expectation is over the noise in observations and potentially over algorithmic randomness.

Role of Kernel Regularity

The kernel function (e.g., squared exponential or Matérn) determines the smoothness and complexity of the function class, and hence fundamentally shapes the achievable regret rates. The kernel regularity parameter (smoothness for Matérn) and dimension $d$ dictate both upper and lower bounds on regret.

2. Lower Bounds on Cumulative Regret

"Lower bounds" specify the performance limit for any algorithm, independent of its structure.

For the isotropic squared-exponential (SE) kernel:

$\mathbb{E}[R_T] = \Omega\left( \sqrt{T \sigma^2 (\log(B^2 T/\sigma^2))^{d/2}} \right )$

where $B$ is the RKHS norm bound and $\sigma^2$ the noise variance.

For the Matérn- $\nu$ kernel:

$\mathbb{E}[R_T] = \Omega\left( B^{\frac{d}{2\nu + d}} \sigma^{\frac{2\nu}{2\nu + d}} T^{\frac{\nu + d}{2\nu + d}} \right )$

These bounds are algorithm-independent and reflect the intrinsic statistical hardness posed by the function class and noise.

This suggests that for less smooth (small $\nu$ ), high-dimensional functions, the regret is unavoidably larger. Smoother processes and lower dimensionality permit faster sublinear rates.

3. Cumulative Regret Upper Bounds and Their Advances

Classical Upper Bounds

For the GP-UCB algorithm, the best known prior cumulative regret upper bound for the SE kernel was: $O^*\left( \sqrt{T} (\log T)^{2d} \right)$ and for the Matérn kernel,

$O^*\left( T^{\frac{1}{2} \cdot \frac{2\nu + 3d(d+1)}{2\nu + d(d+1)}} \right)$

where the $O^*$ notation hides lower-order log terms.

Recent Progress: Tight and Improved Bounds

Recent research closed several key gaps:

For the squared exponential kernel, advanced analyses demonstrate that GP-UCB achieves:

$O(\sqrt{T \log^4 T})$

with high probability, which almost matches the lower bound up to a logarithmic factor (see "Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization" (Iwazaki, 2 Jun 2025)).

For the Matérn kernel with sufficient smoothness and moderate dimension, the regret scaling reaches

$O(\sqrt{T})$

with high probability, provided $2\nu + d \leq \nu^2$ and $4\nu^2 + 2\nu d \geq d^2$ .

Comparison with Algorithm-Independent Lower Bounds

For SE kernels, upper and lower bounds are now separated only by a polylogarithmic factor, implying near-optimality. For Matérn, these improved analyses reduce the previously substantial gap to the information-theoretic lower bound.

4. Key Proof Techniques: Information Gain and Input Sequence Concentration

Maximum Information Gain, $\gamma_T$

A central tool is the information gain: $\gamma_T = \max_{x_1,\ldots,x_T} I(\mathbf{y}_T ; f_T)$ where $I(\cdot ; \cdot)$ is mutual information, and $\mathbf{y}_T$ are observations. The regret bounds traditionally scale as $O(\sqrt{T \gamma_T})$ .

The improvement in regret rates arises from a finer-grained analysis of the realized input sequence, showing that points selected by GP-UCB eventually cluster near the function's maximizer. This localized behavior means the realized information gain is far smaller than the worst-case global $\gamma_T$ , hence leading to tighter regret bounds.

Sample Path Regularity and Local Information Gain

Advanced proofs demonstrate that—due to sample path regularity and the exploitation-exploration trade-off—the algorithm naturally focuses queries in a shrinking ball around the maximizer, and the regret is then controlled by the local behavior of the information gain.

This approach replaces previous worst-case global bounds with bounds adaptively concentrated near the optimum, formally justifying observed empirical performance and matching specialized theoretical lower bounds (see "Tight Regret Bounds for Bayesian Optimization in One Dimension" (Scarlett, 2018) and (Iwazaki, 2 Jun 2025)).

5. Practical Implications and Applications

Implications for Algorithm Choice

For practitioners using the SE kernel in Bayesian optimization, standard GP-UCB achieves nearly optimal performance, with only a marginally suboptimal logarithmic factor.
For Matérn kernels (particularly with higher smoothness), recent analyses certify near-optimality of GP-UCB, bridging the theoretical gap, which is significant for practical choices in hyperparameter tuning and experimental design.

Extension to Alternative Acquisition Strategies and Constraint Handling

These regret analyses also inform the design and tuning of other acquisition strategies (e.g., Thompson Sampling, Expected Improvement).
For online and distributed convex optimization with constraints, analogous cumulative regret bounds (typically of the form $O(T^{1-\beta})$ or $O(\sqrt{T})$ ) are achievable, with recent algorithms ensuring both regret and cumulative constraint violation are jointly bounded, a key requirement in safety-critical and resource-limited applications.

Table: Summary of Cumulative Regret Bounds for GP Bandits

Kernel	Lower Bound	State-of-the-art Upper Bound	Algorithm Matching
Squared Exponential	$\Omega(\sqrt{T} (\log T)^{d/2})$	$O(\sqrt{T} \log^2 T)$	GP-UCB (Iwazaki, 2 Jun 2025)
Matérn ( $\nu \gg d$ )	$\Omega(T^{(\nu+d)/(2\nu+d)})$	$O(\sqrt{T})$ (if smooth)	GP-UCB (Iwazaki, 2 Jun 2025)
1D, general stationary	$\Omega(\sqrt{T})$	$O(\sqrt{T \log T})$	Scarlett (2018) (Scarlett, 2018)

This table reflects results for the standard Bayesian setting with Gaussian noise.

6. Current Open Directions and Recommendations

Further closing the log-factor gap in the SE kernel case remains an open theoretical challenge, as does extending these techniques for Matérn kernels to very high dimensions or lower smoothness.
Algorithm-independent lower bounds emphasize the irremovable effect of dimensionality and smoothness on regret, guiding practitioners to select kernel functions matched to problem complexity and data regime.
For practitioners, these results justify using standard GP-UCB in typical Bayesian optimization settings, with parameter tuning informed by the dimension and smoothness properties of the kernel.

In summary, cumulative regret bounds for GP bandit optimization have recently been sharpened to nearly match the universal lower bounds, certifying that simple, practical algorithms such as GP-UCB attain optimal exploitation-exploration tradeoffs for a broad range of kernel classes and problem settings. These advances equip both theorists and practitioners with precise guidance on the achievable performance and optimality of sequential optimization under noise.