Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit (2306.09202v3)

Published 15 Jun 2023 in cs.LG

Abstract: We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Best arm identification in multi-armed bandits. In The 23rd Conference on Learning Theory, pages 41–53.
  2. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256.
  3. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77.
  4. Bronstein, A. (2014). Lagrangian duality.
  5. Regret analysis of stochastic and nonstochastic multi-armed bandit problems.
  6. Pure exploration of multi-armed bandit under matroid constraints. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, volume 49 of JMLR Workshop and Conference Proceedings, pages 647–669. JMLR.org.
  7. Nearly optimal sampling algorithms for combinatorial pure exploration. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 482–534. PMLR.
  8. Combinatorial pure exploration of multi-armed bandits. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, page 379–387, Cambridge, MA, USA. MIT Press.
  9. Number: The Language of Science. A Plume book. Penguin Publishing Group.
  10. Combinatorial pure exploration with bottleneck reward function. In Advances in Neural Information Processing Systems, volume 34, pages 23956–23967. Curran Associates, Inc.
  11. Combinatorial pure exploration with full-bandit or partial linear feedback. In AAAI Conference on Artificial Intelligence.
  12. Sequential experimental design for transductive linear bandits. In Neural Information Processing Systems.
  13. Minimum fill-in of sparse graphs: Kernelization and approximation. Algorithmica, 71(1):1–20.
  14. Best arm identification: A unified approach to fixed budget and fixed confidence. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, NIPS’12, page 3212–3220, Red Hook, NY, USA. Curran Associates Inc.
  15. Improved learning complexity in combinatorial pure exploration bandits. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of Proceedings of Machine Learning Research, pages 1004–1012, Cadiz, Spain. PMLR.
  16. Gibbons, A. (1985). Algorithmic Graph Theory. Cambridge University Press.
  17. The traveling salesman problem and its variations.
  18. Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pages 805–849. PMLR.
  19. Efficient selection of multiple bandit arms: Theory and practice. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 511–518, Madison, WI, USA. Omnipress.
  20. An empirical process approach to the union bound: Practical algorithms for combinatorial and linear bandits. In Advances in Neural Information Processing Systems, volume 33, pages 10371–10382. Curran Associates, Inc.
  21. Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback. Neural Computation, 32(9):1733–1773.
  22. Transferable contextual bandits with prior observations. In Advances in Knowledge Discovery and Data Mining, pages 398–410, Cham. Springer International Publishing.
  23. Thompson sampling for real-valued combinatorial pure exploration of multi-armed bandit. arXiv:2308.10238.
  24. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16–34.
  25. Solving dominating set in larger classes of graphs: Fpt algorithms and polynomial kernels. In Algorithms - ESA 2009, pages 694–705, Berlin, Heidelberg. Springer Berlin Heidelberg.
  26. Production Planning by Mixed Integer Programming. Springer Publishing Company, Incorporated, 1st edition.
  27. Top-$k$ combinatorial bandits with full-bandit feedback. In International Conference on Algorithmic Learning Theory.
  28. Rivasplata, O. (2012). Subgaussian random variables : An expository note.
  29. Sniedovich, M. (2006). Dijkstra’s algorithm revisited: the dynamic programming connexion. Control and Cybernetics, 35:599–620.
  30. Best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 1.
  31. Villani, C. (2008). Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg.
  32. Thompson sampling for (combinatorial) pure exploration. In Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA.
  33. A fully adaptive algorithm for pure exploration in linear bandits. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 843–851. PMLR.
  34. An optimal algorithm for the stochastic bandits while knowing the near-optimal mean reward. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2285–2291.
Citations (2)

Summary

We haven't generated a summary for this paper yet.