Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits (1905.10040v4)

Published 24 May 2019 in stat.ML and cs.LG

Abstract: We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for the alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Niladri S. Chatterji (21 papers)
  2. Vidya Muthukumar (33 papers)
  3. Peter L. Bartlett (86 papers)
Citations (43)

Summary

We haven't generated a summary for this paper yet.