Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem (2404.01198v1)

Published 1 Apr 2024 in cs.LG, cs.DS, and stat.ML

Abstract: We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem. An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $\Omega(\sqrt{k})$ approximation factor relative to the optimal reward. We then provide a randomized online algorithm that guarantees an $O(\sqrt{k})$ approximation factor, if it is told the maximum reward achievable by the optimal arm in advance. We then show how to remove this assumption at the cost of an extra $O(\log k)$ approximation factor, achieving an overall $O(\sqrt{k} \log k)$ approximation relative to optimal.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. “The Theory of Search Games and Rendezvous” 55, International Series in Operations Research & Management Science Boston: Kluwer Academic Publishers, 2003 DOI: 10.1007/b100809
  2. Shmuel Gal “Search Games” In Wiley Encyclopedia of Operations Research and Management Science John Wiley & Sons, Ltd, 2011 DOI: 10.1002/9780470400531.eorms0912
  3. “Rigorous learning curve bounds from statistical mechanics” In Machine Learning 25.2, 1996, pp. 195–236 DOI: 10.1007/BF00114010
  4. Hoda Heidari, Michael Kearns and Aaron Roth “Tight policy regret bounds for improving and decaying bandits” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16 New York, New York, USA: AAAI Press, 2016, pp. 1562–1570
  5. “Bandit Algorithms” Cambridge University Press, 2020 DOI: 10.1017/9781108571401
  6. “Hyperband: a novel bandit-based approach to hyperparameter optimization” In J. Mach. Learn. Res. 18.1 JMLR.org, 2017, pp. 6765–6816
  7. “Efficient Automatic CASH via Rising Bandits” In Proceedings of the AAAI Conference on Artificial Intelligence 34.4, 2020, pp. 4763–4771 DOI: 10.1609/aaai.v34i04.5910
  8. Andrew McGregor, Krzysztof Onak and Rina Panigrahy “The Oil Searching Problem” Series Title: Lecture Notes in Computer Science In Algorithms - ESA 2009 5757 Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 504–515 DOI: 10.1007/978-3-642-04128-0˙45
  9. “Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits” In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence Macau, SAR China: International Joint Conferences on Artificial Intelligence Organization, 2023, pp. 4100–4108 DOI: 10.24963/ijcai.2023/456
  10. Ronald L. Rivest “Learning Learning Curves” (Unpublished) Slides from talk given at Rob Schapire’s 60th birthday party., 2023 URL: https://people.csail.mit.edu/rivest/pubs/Riv23b.pdf
  11. Aleksandrs Slivkins “Introduction to Multi-Armed Bandits” arXiv, 2022 arXiv: http://arxiv.org/abs/1904.07272
  12. William R Thompson “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.” In Biometrika 25.3, 1933, pp. 285–294 DOI: 10.1093/biomet/25.3-4.285
  13. Andrew Chi-Chin Yao “Probabilistic computations: Toward a unified measure of complexity” In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), 1977, pp. 222–227 DOI: 10.1109/SFCS.1977.24

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com