OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits (1905.10040v4)

Published 24 May 2019 in stat.ML and cs.LG

Abstract: We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for the alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

View on arXiv

Authors (3)

Niladri S. Chatterji (21 papers)
Vidya Muthukumar (33 papers)
Peter L. Bartlett (86 papers)

Citations (43)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits (1905.10040v4)

Summary

Related Papers