Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits (2405.19752v2)

Published 30 May 2024 in cs.LG, cs.DS, and stat.ML

Abstract: We study the stochastic multi-armed bandit problem in the $P$-pass streaming model. In this problem, the $n$ arms are present in a stream and at most $m<n$ arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of $m, n$ and $P$. Specifically, we design an algorithm with $\tilde O\left((n-m){1+\frac{2{P}-2}{2{P+1}-1}} n{\frac{2-2{P+1}}{2{P+1}-1}} T{\frac{2P}{2{P+1}-1}}\right)$ regret and complement it with an $\tilde \Omega\left((n-m){1+\frac{2{P}-2}{2{P+1}-1}} n{\frac{2-2{P+1}}{2{P+1}-1}} T{\frac{2P}{2{P+1}-1}}\right)$ lower bound when the number of rounds $T$ is sufficiently large. Our results are tight up to a logarithmic factor in $n$ and $P$.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yuchen He (53 papers)
  2. Zichun Ye (3 papers)
  3. Chihao Zhang (29 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.