Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SPRT-based Efficient Best Arm Identification in Stochastic Bandits (2207.11158v3)

Published 22 Jul 2022 in stat.ML and cs.LG

Abstract: This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “Pure exploration in multi-armed bandits problems,” in Proc. International Conference on Algorithmic Learning Theory, Porto, Portugal, October 2009.
  2. “On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning,” in Proc. International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, April 2014.
  3. “An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits,” in Proc. Advances in Neural Information Processing Systems, Virtual, December 2020.
  4. “Best arm identification: A unified approach to fixed budget and fixed confidence,” in Proc. Advances in Neural Information Processing Systems, Lake Tahoe, NV, December 2012.
  5. “PAC subset selection in stochastic multi-armed bandits,” in Proc. International Conference on Machine Learning, Madison, WI, June 2012.
  6. A. Garivier and E. Kaufmann, “Optimal best arm identification with fixed confidence,” in Proc. Conference on Learning Theory, New York, NY, June 2016.
  7. “A fully adaptive algorithm for pure exploration in linear bandits,” in Proc. International Conference on Artificial Intelligence and Statistics, Lanzarote, Canary Islands, April 2018.
  8. “lil’ ucb : An optimal exploration algorithm for multi-armed bandits,” in Proc. Conference on Learning Theory, Barcelona, Spain, June 2014.
  9. “Almost optimal exploration in multi-armed bandits,” in Proc. International Conference on Machine Learning, Atlanta, GA, June 2013, pp. 1238–1246.
  10. Y. Jedra and A. Proutiere, “Optimal best-arm identification in linear bandits,” in Proc. Advances in Neural Information Processing Systems, Virtual, December 2020.
  11. “Non-asymptotic pure exploration by solving games,” in Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, December 2019.
  12. “Gamification of pure exploration for linear bandits,” in Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, July 2020.
  13. “Fast pure exploration via frank-wolfe,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, Eds., virtual, December 2021.
  14. K. Jamieson and R. Nowak, “Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting,” in Proc. Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, MArch 2014.
  15. “Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems,” Journal of Machine Learning Research, vol. 7, no. 39, pp. 1079–1105, 2006.
  16. “Sequential experimental design for transductive linear bandits,” in Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, November 2019.
  17. “Best-arm identification in linear bandits,” in Proc. International Conference on Neural Information Processing Systems, Montreal, Canada, December 2014.
  18. “Best arm identification in linear bandits with linear dimension dependency,” in Proc. International Conference on Machine Learning, Stockholmsmässan, Stockholm Sweden, July 2018, pp. 4877–4886.
  19. “Optimal δ𝛿\deltaitalic_δ-correct best-arm selection for heavy-tailed distributions,” in Proc. International Conference on Algorithmic Learning Theory, San Diego, CA, February 2020.
  20. Daniel Russo, “Simple bayesian algorithms for best-arm identification,” Operations Research, vol. 68, no. 6, pp. 1625–1647, April 2020.
  21. W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika, vol. 25, no. 3/4, pp. 285–294, 1933.
  22. “Automatic ad format selection via contextual bandits,” San Francisco, CA, October 2013.
  23. S. Agarwal and N. Goyal, “Analysis of thompson sampling for the multi-armed bandit problem,” in Proc. Annual Conference on Learning Theory, Edinburgh, Scotland, June 2012.
  24. “Improving the expected improvement algorithm,” in Proc. Advances in Neural Information Processing Systems, Long Beach, CA, December 2017.
  25. “Fixed-confidence guarantees for Bayesian best-arm identification,” in Proc. International Conference on Artificial Intelligence and Statistics, Sicily, Italy, August 2020.
  26. A. Wald, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, June 1945.
  27. “SPRT-Based Best Arm Identification in Stochastic Bandits,” in Proc. IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, July 2022.
  28. B. K. Ghosh, “Sequential analysis: Tests and Confidence Intervals,” SIAM Review, vol. 29, no. 2, pp. 315–318, 1987.
  29. “Top two algorithms revisited,” in Proc. Advances in Neural Information Processing Systems, New Orleans, LA, December 2022.
  30. E. Kaufmann and W. M. Koolen, “Mixture martingales revisited with applications to sequential tests and confidence intervals,” Journal of Machine Learning Research, vol. 22, no. 246, pp. 1–44, 2021.
  31. “How good are interior point methods? klee–minty cubes tighten iteration-complexity bounds,” Mathematical Programming, vol. 113, no. 1, pp. 1–14, 2008.
  32. M. Jourdan and R. Degenne, “Non-asymptotic analysis of a UCB-based top two algorithm,” arXiv 2210.05431, 2022.
Citations (5)

Summary

We haven't generated a summary for this paper yet.