Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms (2202.13001v6)

Published 25 Feb 2022 in cs.LG and stat.ML

Abstract: We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M\le K$, the learner aims to compete with the best subset of arms of size $M$. We design an algorithm based on a reduction to bandit submodular maximization, and show that, for $T$ rounds comprised of $N$ tasks, in the regime of large number of tasks and small number of optimal arms $M$, its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(NM\sqrt{M \tau}+N{2/3}M\tau)$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N{1/2}\sqrt{M K \tau})$ regret.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. MohammadJavad Azizi (3 papers)
  2. Thang Duong (3 papers)
  3. Yasin Abbasi-Yadkori (35 papers)
  4. Claire Vernade (31 papers)
  5. Mohammad Ghavamzadeh (97 papers)
  6. András György (46 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.