Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rotting Infinitely Many-armed Bandits (2201.12975v3)

Published 31 Jan 2022 in cs.LG, cs.DS, math.OC, and stat.ML

Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max{\varrho{1/3}T,\sqrt{T}})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max{\varrho{1/3}T,\sqrt{T}})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max{\varrho{1/3}T,T{3/4}})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Y. Wang, J.-y. Audibert, and R. Munos, “Algorithms for infinitely many-armed bandits,” in Advances in Neural Information Processing Systems (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), vol. 21, Curran Associates, Inc., 2009.
  2. J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, and M. Valko, “Rotting bandits are no harder than stochastic ones,” in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (K. Chaudhuri and M. Sugiyama, eds.), vol. 89 of Proceedings of Machine Learning Research, pp. 2564–2572, PMLR, 16–18 Apr 2019.
  3. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, 2002.
  4. T. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Adv. Appl. Math., vol. 6, p. 4–22, mar 1985.
  5. P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 2, pp. 235–256, 2002.
  6. A. Slivkins, “Introduction to multi-armed bandits,” Foundations and Trends® in Machine Learning, vol. 12, no. 1-2, pp. 1–286, 2019.
  7. A. Garivier and E. Moulines, “On upper-confidence bound policies for switching bandit problems,” in Algorithmic Learning Theory (J. Kivinen, C. Szepesvári, E. Ukkonen, and T. Zeugmann, eds.), (Berlin, Heidelberg), pp. 174–188, Springer Berlin Heidelberg, 2011.
  8. O. Besbes, Y. Gur, and A. Zeevi, “Stochastic multi-armed-bandit problem with non-stationary rewards,” in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, (Cambridge, MA, USA), p. 199–207, MIT Press, 2014.
  9. P. Auer, P. Gajane, and R. Ortner, “Adaptively tracking the best bandit arm with an unknown number of distribution changes,” in Conference on Learning Theory, pp. 138–158, PMLR, 2019.
  10. W. C. Cheung, D. Simchi-Levi, and R. Zhu, “Learning to optimize under non-stationarity,” in The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1079–1087, PMLR, 2019.
  11. Y. Chen, C.-W. Lee, H. Luo, and C.-Y. Wei, “A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free,” in Conference on Learning Theory, pp. 696–726, PMLR, 2019.
  12. P. Zhao, L. Zhang, Y. Jiang, and Z.-H. Zhou, “A simple approach for non-stationary linear bandits,” in International Conference on Artificial Intelligence and Statistics, pp. 746–755, PMLR, 2020.
  13. Y. Russac, C. Vernade, and O. Cappé, “Weighted linear bandits for non-stationary environments,” arXiv preprint arXiv:1909.09146, 2019.
  14. W. C. Cheung, D. Simchi-Levi, and R. Zhu, “Reinforcement learning for non-stationary markov decision processes: The blessing of (more) optimism,” in International Conference on Machine Learning, pp. 1843–1854, PMLR, 2020.
  15. D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal, “Mortal multi-armed bandits,” in Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, (Red Hook, NY, USA), p. 273–280, Curran Associates Inc., 2008.
  16. S. Tracà, C. Rudin, and W. Yan, “Reducing exploration of dying arms in mortal bandits,” in Proceedings of The 35th Uncertainty in Artificial Intelligence Conference (R. P. Adams and V. Gogate, eds.), vol. 115 of Proceedings of Machine Learning Research, pp. 156–163, PMLR, 22–25 Jul 2020.
  17. A. Slivkins and E. Upfal, “Adapting to a changing environment: the brownian restless bandits,” in 21st Annual Conference on Learning Theory - COLT 2008, Helsinki, Finland, July 9-12, 2008, pp. 343–354, 2008.
  18. J. Komiyama and T. Qin, “Time-decaying bandits for non-stationary systems,” in Web and Internet Economics (T.-Y. Liu, Q. Qi, and Y. Ye, eds.), (Cham), pp. 460–466, Springer International Publishing, 2014.
  19. H. Heidari, M. Kearns, and A. Roth, “Tight policy regret bounds for improving and decaying bandits,” in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, p. 1562–1570, 2016.
  20. D. Bouneffouf and R. Féraud, “Multi-armed bandit problem with known trend,” Neurocomputing, vol. 205, pp. 16–21, 2016.
  21. N. Levine, K. Crammer, and S. Mannor, “Rotting bandits,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, (Red Hook, NY, USA), p. 3077–3086, Curran Associates Inc., 2017.
  22. J. Seznec, P. Menard, A. Lazaric, and M. Valko, “A single algorithm for both restless and rested rotting bandits,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (S. Chiappa and R. Calandra, eds.), vol. 108 of Proceedings of Machine Learning Research, pp. 3784–3794, PMLR, 26–28 Aug 2020.
  23. D. A. Berry, R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp, “Bandit problems with infinitely many arms,” The Annals of Statistics, vol. 25, no. 5, pp. 2103 – 2116, 1997.
  24. T. Bonald and A. Proutière, “Two-target algorithms for infinite-armed bandits with bernoulli rewards,” in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, (Red Hook, NY, USA), p. 2184–2192, Curran Associates Inc., 2013.
  25. A. Carpentier and M. Valko, “Simple regret for infinitely many armed bandits,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, p. 1133–1141, JMLR.org, 2015.
  26. M. Bayati, N. Hamidi, R. Johari, and K. Khosravi, “Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, eds.), vol. 33, pp. 1713–1723, Curran Associates, Inc., 2020.
  27. Y. Abbasi-Yadkori, D. Pál, and C. Szepesvári, “Improved algorithms for linear stochastic bandits,” Advances in neural information processing systems, vol. 24, 2011.
  28. S. Bubeck, G. Stoltz, and J. Y. Yu, “Lipschitz bandits without the lipschitz constant,” in International Conference on Algorithmic Learning Theory, pp. 144–158, Springer, 2011.
  29. A. Tsun, “Probability & statistics with applications to computing,” 2020.
  30. P. Rigollet and J.-C. Hütter, “High dimensional statistics,” Lecture notes for course 18S997, vol. 813, p. 814, 2015.
  31. D. G. Brown, “How I wasted too long finding a concentration inequality for sums of geometric variables,” Found at https://cs. uwaterloo. ca/~ browndg/negbin. pdf, vol. 8, no. 4, 2011.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jung-hun Kim (9 papers)
  2. Milan Vojnovic (25 papers)
  3. Se-Young Yun (114 papers)
Citations (5)