Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free (1902.00980v3)

Published 3 Feb 2019 in cs.LG and stat.ML

Abstract: We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret. Specifically, our algorithm achieves dynamic regret $\mathcal{O}(\min{\sqrt{ST}, \Delta{\frac{1}{3}}T{\frac{2}{3}}})$ for a contextual bandit problem with $T$ rounds, $S$ switches and $\Delta$ total variation in data distributions. Importantly, our algorithm is adaptive and does not need to know $S$ or $\Delta$ ahead of time, and can be implemented efficiently assuming access to an ERM oracle. Our results strictly improve the $\mathcal{O}(\min {S{\frac{1}{4}}T{\frac{3}{4}}, \Delta{\frac{1}{5}}T{\frac{4}{5}}})$ bound of (Luo et al., 2018), and greatly generalize and improve the $\mathcal{O}(\sqrt{ST})$ result of (Auer et al, 2018) that holds only for the two-armed bandit problem without contextual information. The key novelty of our algorithm is to introduce replay phases, in which the algorithm acts according to its previous decisions for a certain amount of time in order to detect non-stationarity while maintaining a good balance between exploration and exploitation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yifang Chen (31 papers)
  2. Chung-Wei Lee (19 papers)
  3. Haipeng Luo (99 papers)
  4. Chen-Yu Wei (46 papers)
Citations (128)