An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits (2310.19025v2)
Abstract: We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T{\frac{2}{3}}(K\log(|\Pi|)){\frac{1}{3}})$ and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK){\frac{2}{3}}(\log(|\Pi|)){\frac{1}{3}})$ as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.
- Kiarash Banihashem (15 papers)
- MohammadTaghi Hajiaghayi (104 papers)
- Suho Shin (15 papers)
- Max Springer (11 papers)