Batched Stochastic Bandit for Nondegenerate Functions (2405.05733v2)

Published 9 May 2024 in stat.ML and cs.LG

Abstract: This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{{\mathcal{O}}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $\pi$, there exists a problem instance on which $\pi$ admits a regret of order ${\Omega} ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ \Omega ( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.

References (65)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1909819734117220863

https://twitter.com/StatMLPapers/status/1788783259855290842

Batched Stochastic Bandit for Nondegenerate Functions (2405.05733v2)

Summary

Related Papers

Tweets