Papers
Topics
Authors
Recent
Search
2000 character limit reached

Differential Good Arm Identification

Published 13 Mar 2023 in cs.LG and stat.ML | (2303.07154v3)

Abstract: This paper targets a variant of the stochastic multi-armed bandit problem called good arm identification (GAI). GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible, where a good arm is defined as an arm whose expected reward is greater than a given threshold. In this work, we propose DGAI - a differentiable good arm identification algorithm to improve the sample complexity of the state-of-the-art HDoC algorithm in a data-driven fashion. We also showed that the DGAI can further boost the performance of a general multi-arm bandit (MAB) problem given a threshold as a prior knowledge to the arm set. Extensive experiments confirm that our algorithm outperform the baseline algorithms significantly in both synthetic and real world datasets for both GAI and MAB tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, 2312–2320.
  2. Best arm identification in multi-armed bandits. In COLT, 41–53. Citeseer.
  3. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2): 235–256.
  4. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science, 322–331. IEEE.
  5. Generalization bounds of stochastic gradient descent for wide and deep neural networks. Advances in neural information processing systems, 32.
  6. Kernel methods for pattern analysis, volume 173. Cambridge University Press Cambridge.
  7. Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network. arXiv preprint arXiv:2202.08867.
  8. Classification with kernel mahalanobis distance classifiers. In Advances in Data Analysis, Data Handling and Business Intelligence: Proceedings of the 32nd Annual Conference of the Gesellschaft für Klassifikation eV, Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University, Hamburg, July 16-18, 2008, 351–361. Springer.
  9. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4): 1–19.
  10. Hoeffding, W. 1994. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, 409–426. Springer.
  11. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31.
  12. PAC subset selection in stochastic multi-armed bandits. In ICML, volume 12, 655–662.
  13. Good arm identification via bandit feedback. Machine Learning, 108(5): 721–745.
  14. Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In International Conference on Machine Learning, 3601–3610. PMLR.
  15. Bandit algorithms. preprint.
  16. Bandit algorithms. Cambridge University Press.
  17. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, 661–670.
  18. An optimal algorithm for the thresholding bandit problem. In International Conference on Machine Learning, 1690–1698. PMLR.
  19. An optimal algorithm for the Thresholding Bandit Problem. In Balcan, M. F.; and Weinberger, K. Q., eds., Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, 1690–1698. New York, New York, USA: PMLR.
  20. Empirical bernstein stopping. In Proceedings of the 25th international conference on Machine learning, 672–679.
  21. Bootstrapped thompson sampling and deep exploration. arXiv preprint arXiv:1507.00300.
  22. Russo, D. 2016. Simple bayesian algorithms for best arm identification. In Conference on Learning Theory, 1417–1418. PMLR.
  23. Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms. arXiv preprint arXiv:2008.07146.
  24. Reinforcement learning: An introduction. MIT press.
  25. lil’HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap. arXiv preprint arXiv:2401.15879.
  26. Finite-time analysis of kernelised contextual bandits. arXiv preprint arXiv:1309.6869.
  27. Differentiable linear bandit algorithm. arXiv preprint arXiv:2006.03000.
  28. Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, 11492–11502. PMLR.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.