Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Contextual Bandits in Non-stationary Worlds (1708.01799v4)

Published 5 Aug 2017 in cs.LG and stat.ML

Abstract: Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret $\mathcal{O}(\sqrt{ST})$ if there are $T$ rounds with $S$ stationary periods, or more generally $\mathcal{O}(\Delta{1/3}T{2/3})$ where $\Delta$ is some non-stationarity measure. These results almost match the optimal guarantees achieved by an inefficient baseline that is a variant of the classic Exp4 algorithm. The dynamic regret result is also the first one for efficient and fully adversarial contextual bandit. Furthermore, while the results above require tuning a parameter based on the unknown quantity $S$ or $\Delta$, we also develop a parameter free algorithm achieving regret $\min{S{1/4}T{3/4}, \Delta{1/5}T{4/5}}$. This improves and generalizes the best existing result $\Delta{0.18}T{0.82}$ by Karnin and Anava (2016) which only holds for the two-armed bandit problem.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Haipeng Luo (99 papers)
  2. Chen-Yu Wei (46 papers)
  3. Alekh Agarwal (99 papers)
  4. John Langford (94 papers)
Citations (125)