Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies (2311.18437v1)

Published 30 Nov 2023 in cs.LG, cs.SY, eess.SY, math.OC, and stat.ML

Abstract: This paper studies the one-shot behavior of no-regret algorithms for stochastic bandits. Although many algorithms are known to be asymptotically optimal with respect to the expected regret, over a single run, their pseudo-regret seems to follow one of two tendencies: it is either smooth or bumpy. To measure this tendency, we introduce a new notion: the sliding regret, that measures the worst pseudo-regret over a time-window of fixed length sliding to infinity. We show that randomized methods (e.g. Thompson Sampling and MED) have optimal sliding regret, while index policies, although possibly asymptotically optimal for the expected regret, have the worst possible sliding regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB, MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret of index policies via the regret of exploration, that we show to be suboptimal as well.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. S. Agrawal and N. Goyal. Analysis of Thompson Sampling for the multi-armed bandit problem, Apr. 2012. arXiv:1111.1797 [cs].
  2. S. Agrawal and N. Goyal. Further optimal regret bounds for thompson sampling. In C. M. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 of Proceedings of Machine Learning Research, pages 99–107, Scottsdale, Arizona, USA, 29 Apr–01 May 2013. PMLR.
  3. Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory, pages 150–165. Springer, 2007.
  4. Minimax policies for adversarial and stochastic bandits. In COLT, volume 7, pages 1–122, 2009.
  5. P. Auer. Using Confidence Bounds for Exploitation-Exploration Trade-offs. J. Mach. Learn. Res., 3:397–422, 2002.
  6. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 322–331, 1995.
  7. V. Boone and B. Gaujal. The Regret of Exploration and the Control of Bad Episodes in Reinforcement Learning. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 2824–2856. PMLR, July 2023.
  8. A. Garivier and O. Cappé. The kl-ucb algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory, pages 359–376. JMLR Workshop and Conference Proceedings, 2011.
  9. J. Honda and A. Takemura. An Asymptotically Optimal Policy for Finite Support Models in the Multiarmed Bandit Problem, 2010.
  10. J. Honda and A. Takemura. Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards. J. Mach. Learn. Res., 16:3721–3756, 2015.
  11. Thompson Sampling: An Asymptotically Optimal Finite Time Analysis. arXiv:1205.4217 [cs, stat], July 2012. URL http://arxiv.org/abs/1205.4217. arXiv: 1205.4217.
  12. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models. Journal of Machine Learning Research, 17(1):1–42, 2016. URL http://jmlr.org/papers/v17/kaufman16a.html.
  13. T. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, Mar. 1985.
  14. T. Lattimore and C. Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
  15. A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences. In Proceedings of the 24th annual Conference On Learning Theory, pages 497–514. JMLR Workshop and Conference Proceedings, 2011.
  16. W. R. Thompson. On the Likelihood that One Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, 25(3-4):285–294, Dec. 1933.

Summary

We haven't generated a summary for this paper yet.