Differential Good Arm Identification
Abstract: This paper targets a variant of the stochastic multi-armed bandit problem called good arm identification (GAI). GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible, where a good arm is defined as an arm whose expected reward is greater than a given threshold. In this work, we propose DGAI - a differentiable good arm identification algorithm to improve the sample complexity of the state-of-the-art HDoC algorithm in a data-driven fashion. We also showed that the DGAI can further boost the performance of a general multi-arm bandit (MAB) problem given a threshold as a prior knowledge to the arm set. Extensive experiments confirm that our algorithm outperform the baseline algorithms significantly in both synthetic and real world datasets for both GAI and MAB tasks.
- Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, 2312–2320.
- Best arm identification in multi-armed bandits. In COLT, 41–53. Citeseer.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2): 235–256.
- Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science, 322–331. IEEE.
- Generalization bounds of stochastic gradient descent for wide and deep neural networks. Advances in neural information processing systems, 32.
- Kernel methods for pattern analysis, volume 173. Cambridge University Press Cambridge.
- Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network. arXiv preprint arXiv:2202.08867.
- Classification with kernel mahalanobis distance classifiers. In Advances in Data Analysis, Data Handling and Business Intelligence: Proceedings of the 32nd Annual Conference of the Gesellschaft für Klassifikation eV, Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University, Hamburg, July 16-18, 2008, 351–361. Springer.
- The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4): 1–19.
- Hoeffding, W. 1994. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, 409–426. Springer.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31.
- PAC subset selection in stochastic multi-armed bandits. In ICML, volume 12, 655–662.
- Good arm identification via bandit feedback. Machine Learning, 108(5): 721–745.
- Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In International Conference on Machine Learning, 3601–3610. PMLR.
- Bandit algorithms. preprint.
- Bandit algorithms. Cambridge University Press.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, 661–670.
- An optimal algorithm for the thresholding bandit problem. In International Conference on Machine Learning, 1690–1698. PMLR.
- An optimal algorithm for the Thresholding Bandit Problem. In Balcan, M. F.; and Weinberger, K. Q., eds., Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, 1690–1698. New York, New York, USA: PMLR.
- Empirical bernstein stopping. In Proceedings of the 25th international conference on Machine learning, 672–679.
- Bootstrapped thompson sampling and deep exploration. arXiv preprint arXiv:1507.00300.
- Russo, D. 2016. Simple bayesian algorithms for best arm identification. In Conference on Learning Theory, 1417–1418. PMLR.
- Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms. arXiv preprint arXiv:2008.07146.
- Reinforcement learning: An introduction. MIT press.
- lil’HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap. arXiv preprint arXiv:2401.15879.
- Finite-time analysis of kernelised contextual bandits. arXiv preprint arXiv:1309.6869.
- Differentiable linear bandit algorithm. arXiv preprint arXiv:2006.03000.
- Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, 11492–11502. PMLR.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.