Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fixed Confidence Best Arm Identification in the Bayesian Setting (2402.10429v2)

Published 16 Feb 2024 in stat.ML and cs.LG

Abstract: We consider the fixed-confidence best arm identification (FC-BAI) problem in the Bayesian setting. This problem aims to find the arm of the largest mean with a fixed confidence level when the bandit model has been sampled from the known prior. Most studies on the FC-BAI problem have been conducted in the frequentist setting, where the bandit model is predetermined before the game starts. We show that the traditional FC-BAI algorithms studied in the frequentist setting, such as track-and-stop and top-two algorithms, result in arbitrarily suboptimal performances in the Bayesian setting. We also obtain a lower bound of the expected number of samples in the Bayesian setting and introduce a variant of successive elimination that has a matching performance with the lower bound up to a logarithmic factor. Simulations verify the theoretical results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Bayesian fixed-budget best-arm identification, 2023.
  2. Best arm identification in multi-armed bandits. In Conference on Learning Theory, pp.  41–53, 2010. URL http://colt2010.haifa.il.ibm.com/papers/COLT2010proceedings.pdf#page=49.
  3. Bechhofer, R. E. A single-sample multiple decision procedure for ranking means of normal populations with known variances. The Annals of Mathematical Statistics, pp.  16–39, 1954.
  4. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7:1079–1105, 2006.
  5. Frazier, P. I. A fully sequential elimination procedure for indifference-zone ranking and selection with tight bounds on probability of correct selection. Operations Research, 62(4):926–942, 2014. doi: 10.1287/opre.2014.1282. URL https://doi.org/10.1287/opre.2014.1282.
  6. Frazier, P. I. A tutorial on bayesian optimization. CoRR, abs/1807.02811, 2018. URL http://arxiv.org/abs/1807.02811.
  7. Best arm identification: A unified approach to fixed budget and fixed confidence. In Bartlett, P. L., Pereira, F. C. N., Burges, C. J. C., Bottou, L., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pp.  3221–3229, 2012. URL https://proceedings.neurips.cc/paper/2012/hash/8b0d268963dd0cfb808aac48a549829f-Abstract.html.
  8. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, pp.  998–1027. PMLR, 2016.
  9. Gupta, S. S. Selection and ranking procedures: a brief introduction. Communications in Statistics - Theory and Methods, 6(11):993–1001, 1977. doi: 10.1080/03610927708827548. URL https://doi.org/10.1080/03610927708827548.
  10. Review on ranking and selection: A new perspective. Frontiers of Engineering Management, 8(3):321–343, 2021.
  11. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 48th Annual Conference on Information Sciences and Systems, CISS 2014, Princeton, NJ, USA, March 19-21, 2014, pp.  1–6. IEEE, 2014. doi: 10.1109/CISS.2014.6814096. URL https://doi.org/10.1109/CISS.2014.6814096.
  12. Non-stochastic best arm identification and hyperparameter optimization. In Gretton, A. and Robert, C. C. (eds.), Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, Cadiz, Spain, May 9-11, 2016, volume 51 of JMLR Workshop and Conference Proceedings, pp.  240–248. JMLR.org, 2016. URL http://proceedings.mlr.press/v51/jamieson16.html.
  13. Non-asymptotic analysis of a ucb-based top two algorithm. CoRR, abs/2210.05431, 2022a. doi: 10.48550/ARXIV.2210.05431. URL https://doi.org/10.48550/arXiv.2210.05431.
  14. Non-asymptotic analysis of a ucb-based top two algorithm. arXiv preprint arXiv:2210.05431, 2022b.
  15. Top two algorithms revisited. Advances in Neural Information Processing Systems, 35:26791–26803, 2022.
  16. An v⁢a⁢r⁢e⁢p⁢s⁢i⁢l⁢o⁢n𝑣𝑎𝑟𝑒𝑝𝑠𝑖𝑙𝑜𝑛varepsilonitalic_v italic_a italic_r italic_e italic_p italic_s italic_i italic_l italic_o italic_n-best-arm identification algorithm for fixed-confidence and beyond. arXiv preprint arXiv:2305.16041, 2023.
  17. Good arm identification via bandit feedback. Mach. Learn., 108(5):721–745, 2019. doi: 10.1007/S10994-019-05784-4. URL https://doi.org/10.1007/s10994-019-05784-4.
  18. Information complexity in bandit subset selection. In Shalev-Shwartz, S. and Steinwart, I. (eds.), COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, volume 30 of JMLR Workshop and Conference Proceedings, pp.  228–251. JMLR.org, 2013. URL http://proceedings.mlr.press/v30/Kaufmann13.html.
  19. On the complexity of a/b testing. In Conference on Learning Theory, pp.  461–481. PMLR, 2014.
  20. On the complexity of best-arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17(1):1–42, 2016a.
  21. On the complexity of best arm identification in multi-armed bandit models, 2016b.
  22. Rate-optimal bayesian simple regret in best arm identification. Mathematics of Operations Research, Ahead of Print, 2023. doi: 10.1287/moor.2022.0011. URL https://doi.org/10.1287/moor.2022.0011.
  23. Lai, T. L. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics, 15(3):1091 – 1114, 1987. doi: 10.1214/aos/1176350495. URL https://doi.org/10.1214/aos/1176350495.
  24. An optimal algorithm for the thresholding bandit problem. In Balcan, M. and Weinberger, K. Q. (eds.), Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pp.  1690–1698. JMLR.org, 2016. URL http://proceedings.mlr.press/v48/locatelli16.html.
  25. Hoeffding races: Accelerating model selection search for classification and function approximation. In Cowan, J. D., Tesauro, G., and Alspector, J. (eds.), Advances in Neural Information Processing Systems 6, [7th NIPS Conference, Denver, Colorado, USA, 1993], pp.  59–66. Morgan Kaufmann, 1993. URL http://papers.nips.cc/paper/841-hoeffding-races-accelerating-model-selection-search-for-classification-and-function-approximation.
  26. Mockus, J. Bayesian Approach to Global Optimization: Theory and Applications. Mathematics and its Applications. Springer Netherlands, 2012. ISBN 9789400909090. URL https://books.google.fr/books?id=VuKoCAAAQBAJ.
  27. Paulson, E. A sequential procedure for selecting the population with the largest mean from k𝑘kitalic_k normal populations. The Annals of Mathematical Statistics, 35(1):174 – 180, 1964. doi: 10.1214/aoms/1177703739. URL https://doi.org/10.1214/aoms/1177703739.
  28. Improving the expected improvement algorithm. In Advances in Neural Information Processing Systems, volume 30, pp.  5381–5391, 2017a.
  29. Improving the expected improvement algorithm. Advances in Neural Information Processing Systems, 30, 2017b.
  30. Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527 – 535, 1952.
  31. Russo, D. Simple bayesian algorithms for best arm identification. In 29th Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pp.  1417–1418. PMLR, 23–26 Jun 2016.
  32. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE, 104(1):148–175, 2016. doi: 10.1109/JPROC.2015.2494218. URL https://doi.org/10.1109/JPROC.2015.2494218.
  33. Fixed-confidence guarantees for bayesian best-arm identification. In International Conference on Artificial Intelligence and Statistics, pp.  1823–1832. PMLR, 2020.
  34. Gaussian process optimization in the bandit setting: No regret and experimental design. In Fürnkranz, J. and Joachims, T. (eds.), Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pp.  1015–1022. Omnipress, 2010. URL https://icml.cc/Conferences/2010/papers/422.pdf.
  35. A bad arm existence checking problem: How to utilize asymmetric problem structure? Mach. Learn., 109(2):327–372, 2020. doi: 10.1007/S10994-019-05854-7. URL https://doi.org/10.1007/s10994-019-05854-7.
  36. Posterior tracking algorithm for classification bandits. In Ruiz, F. J. R., Dy, J. G., and van de Meent, J. (eds.), International Conference on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Congressos, Valencia, Spain, volume 206 of Proceedings of Machine Learning Research, pp.  10994–11022. PMLR, 2023. URL https://proceedings.mlr.press/v206/tabata23a.html.
  37. Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294, 1933.
  38. Thresholding bandit problem with both duels and pulls. CoRR, abs/1910.06368, 2019. URL http://arxiv.org/abs/1910.06368.
  39. Revisiting simple regret: Fast rates for returning a good arm. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp.  42110–42158. PMLR, 2023. URL https://proceedings.mlr.press/v202/zhao23g.html.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com