Optimal Thresholding Linear Bandit

Published 11 Feb 2024 in stat.ML and cs.LG | (2402.09467v1)

Abstract: We study a novel pure exploration problem: the $\epsilon$-Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal.

Abstract PDF HTML Upgrade to Chat

References (17)

Andrea Locatelli, A. C., Maurilio Gutzeit An optimal algorithm for the Thresholding Bandit Problem. 2016, 1690–1698.
Wei Chen, Y. Y., Yajun Wang Combinatorial Multi-Armed Bandit: General Framework, Results and Applications. International Conference on Machine Learning. 2013; pp 151–159.
Auer, P. Using Confidence Bounds for Exploitation-Exploration Trade-offs. Journal of Machine Learning Research 2002, 397–422.
Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 1952, 527–535.
Paat Rusmevichientong, J. N. T. Linearly Parameterized Bandits. Mathematics of Operations Research 2010, 35.
Marta Soare, R., Alessandro Lazaric Best-Arm Identification in Linear Bandits. Advances in neural information processing systems 2014, 27.
Soare, M. Sequential Resource Allocation in Linear Stochastic Bandits. Sequential Resource Allocation in Linear Stochastic Bandits. 2015.
Yassir Jedra, A. P. Optimal Best-arm Identification in Linear Bandits. Advances in neural information processing systems 2020, 33.
Degenne, R.; Menard, P.; Shang, X.; Valko, M. Gamification of Pure Exploration for Linear Bandits. Proceedings of the 37th International Conference on Machine Learning. 2020; pp 2432–2442.
Degenne, R.; Koolen, W. M. Pure Exploration with Multiple Correct Answers. Advances in Neural Information Processing Systems. 2019.
Tao, C.; Blanco, S.; Zhou, Y. Best Arm Identification in Linear Bandits with Linear Dimension Dependency. International Conference on Machine Learning. 2018; pp 4884–4893.
Xu, L.; Honda, J.; Sugiyama, M. A fully adaptive algorithm for pure exploration in linear bandits. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. 2018; pp 843–851.
Gabillon, V.; Ghavamzadeh, M.; Lazaric, A. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence. Advances in Neural Information Processing Systems. 2012; pp 3212–3220.
Garivier, A.; Kaufmann, E. Optimal Best Arm Identification with Fixed Confidence. 29th Annual Conference on Learning Theory. 2016; pp 998–1027.
Fiez, T.; Jain, L.; Jamieson, K. G.; Ratliff, L. Sequential Experimental Design for Transductive Linear Bandits. Advances in Neural Information Processing Systems. 2019.
Tor Lattimore, C. S. Bandit Algorithms; Cambridge University Press, 2018.
Sundaram, R. K.; others A First Course in Optimization Theory; Cambridge University Press, 1996.

Summary

No one has generated a summary of this paper yet.

Sign Up to Summarize

Paper to Video (Beta)

No one has generated a video about this paper yet.

Sign Up to Generate All Videos Create Your Own

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Sign Up to Generate

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Top Community Prompts

Explain it Like I'm 14

Practical Applications

Conceptual Simplification

Sign Up to Activate View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 5 likes about this paper.

Sign Up for Free