Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fixed-Budget Best-Arm Identification in Structured Bandits (2106.04763v8)

Published 9 Jun 2021 in cs.LG

Abstract: Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations. Most works on this topic study unstructured problems with a small number of arms, which limits their applicability. We propose a general tractable algorithm that incorporates the structure, by successively eliminating suboptimal arms based on their mean reward estimates from a joint generalization model. We analyze our algorithm in linear and generalized linear models (GLMs), and propose a practical implementation based on a G-optimal design. In linear models, our algorithm has competitive error guarantees to prior works and performs at least as well empirically. In GLMs, this is the first practical algorithm with analysis for fixed-budget BAI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312–2320, 2011.
  2. Near-optimal design of experiments via regret minimization. In International Conference on Machine Learning, pages 126–135, 2017.
  3. Best Arm Identification in Multi-Armed Bandits. In Proceedings of the 23th Conference on Learning Theory, 2010.
  4. Fast rates for bandit optimization with upper-confidence frank-wolfe. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  5. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23–37, 2009.
  6. Tight (lower) bounds for the fixed budget best arm identification bandit problem. In Conference on Learning Theory, pages 590–604, 2016.
  7. Linear convergence of a modified Frank–Wolfe algorithm for computing minimum-volume enclosing ellipsoids. Optimization Methods and Software, 23(1):5–19, 2008.
  8. John M. Danskin. The theory of max-min, with applications. SIAM Journal on Applied Mathematics, 14(4):641–664, 1966.
  9. Non-asymptotic pure exploration by solving games. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  10. Gamification of pure exploration for linear bandits. In International Conference on Machine Learning, pages 2432–2442, 2020.
  11. Valerii Vadimovich Fedorov. Theory of Optimal Experiments. Probability and Mathematical Statistics. Academic Press, 1972.
  12. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems, 2010.
  13. Best arm identification: A unified approach to fixed budget and fixed confidence. In Advances in Neural Information Processing Systems, pages 3212–3220, 2012.
  14. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In Artificial Intelligence and Statistics, pages 365–374, 2014.
  15. Federated linear contextual bandits. Advances in Neural Information Processing Systems, 34, 2021.
  16. Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning, pages 427–435, 2013.
  17. Optimal best-arm identification in linear bandits. In Advances in Neural Information Processing Systems, pages 10007–10017, 2020.
  18. Almost optimal exploration in multi-armed bandits. In Proceedings of the 30th International Conference on International Conference on Machine Learning, page 1238–1246, 2013.
  19. An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits. In Advances in Neural Information Processing Systems, pages 10371–10382, 2020.
  20. On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
  21. Leonid G Khachiyan. Rounding of polytopes in the real number model of computation. Mathematics of Operations Research, 21(2):307–320, 1996.
  22. The equivalence of two extremum problems. Canadian Journal of Mathematics, 12:363–366, 1960.
  23. Minimum-volume enclosing ellipsoids and core sets. Journal of Optimization Theory and Applications, 126(1):1–21, 2005.
  24. Randomized exploration in generalized linear bandits. In International Conference on Artificial Intelligence and Statistics, pages 2066–2076, 2020.
  25. Bandit Algorithms. Cambridge University Press, 2020.
  26. A contextual-bandit approach to personalized news article recommendation. International Conference on World Wide Web, 2010.
  27. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080, 2017.
  28. Generalized Linear Models. Chapman & Hall, 1989.
  29. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for Thompson sampling. In International Conference on Learning Representations, 2018.
  30. Best-arm identification in linear bandits. In Advances in Neural Information Processing Systems, pages 828–836, 2014.
  31. Best arm identification in linear bandits with linear dimension dependency. In International Conference on Machine Learning, pages 4877–4886, 2018.
  32. Provable meta-learning of linear representations. In International Conference on Machine Learning, pages 10434–10443, 2021.
  33. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics, 2019.
  34. Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons. SIAM Journal on Scientific and Statistical Computing, 9(5):907–921, 1988.
  35. A fully adaptive algorithm for pure exploration in linear bandits. In International Conference on Artificial Intelligence and Statistics, pages 843–851, 2018.
  36. Towards minimax optimal best arm identification in linear bandits. arXiv preprint arXiv:2105.13017, 2021.
Citations (20)

Summary

We haven't generated a summary for this paper yet.