Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Optimize under Non-Stationarity (1810.03024v6)

Published 6 Oct 2018 in cs.LG and stat.ML

Abstract: We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining $d,B_T,$ and $T$ as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal $\widetilde{O}(d{2/3}(B_T+1){1/3}T{2/3})$ dynamic regret, and the tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best $\widetilde{O}(d{2/3}(B_T+1){1/4}T{3/4})$ dynamic regret.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Wang Chi Cheung (19 papers)
  2. David Simchi-Levi (50 papers)
  3. Ruihao Zhu (19 papers)
Citations (123)