Best Arm Identification in Batched Multi-armed Bandit Problems (2312.13875v1)
Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed bandit problem. We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification. The linear program leads to a two-stage algorithm that can achieve good theoretical properties. We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.
- Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Conference on Learning Theory, pages 39–75, 2017.
- Best arm identification in multi-armed bandits. In COLT, pages 41–53, 2010.
- Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory: 20th International Conference, ALT 2009, Porto, Portugal, October 3-5, 2009. Proceedings 20, pages 23–37. Springer, 2009.
- A. Carpentier and A. Locatelli. Tight (lower) bounds for the fixed budget best arm identification bandit problem. In Conference on Learning Theory, pages 590–604. PMLR, 2016.
- Regret bounds for batched bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7340–7348, 2021.
- Pac bounds for multi-armed bandit and markov decision processes. In Computational Learning Theory, pages 255–270, 2002.
- Batched multi-armed bandits problem. Advances in Neural Information Processing Systems, 32, 2019.
- J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B, 41(2):148–177, 1979.
- K. Jamieson and R. Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1–6, 2014.
- lil’ ucb: An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, pages 423–439, 2014.
- Almost optimal anytime algorithm for batched multi-armed bandits. In Conference on Learning Theory, pages 5065–5073, 2021a.
- Double explore-then-commit: Asymptotic optimality and beyond. In Conference on Learning Theory, pages 2584–2633, 2021b.
- Top arm identification in multi-armed bandits with batch arm pulls. In Artificial Intelligence and Statistics, pages 139–148. PMLR, 2016.
- C. Kalkanli and A. Özgür. Batched thompson sampling. In Advances in Neural Information Processing Systems, volume 34, pages 29984–29994, 2021.
- C. Kalkanli and A. Özgür. Asymptotic performance of thompson sampling for batched multi-armed bandits. IEEE Transactions on Information Theory, 69(9):5956–5970, 2023.
- Pac subset selection in stochastic multi-armed bandits. In International Conference on Machine Learning, pages 655–662, 2012.
- Parallelizing thompson sampling. In Advances in Neural Information Processing Systems, volume 34, pages 10535–10548, 2021.
- Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, volume 28, pages 1238–1246, 2013.
- On bayesian upper confidence bounds for bandit problems. In Artificial intelligence and statistics, pages 592–600. PMLR, 2012.
- On the complexity of best-arm identication in multi-armed bandit models. Journal of Machine Learning Research, 17:1–42, 2016.
- Rate-optimal bayesian simple regret in best arm identification. arXiv preprint arXiv:2111.09885, 2023.
- T. L. Lai. Adaptive treatment allocation and the multi-armed bandit problem. The annals of statistics, 15(3):1091–1114, 1987.
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
- Causal bandits: Learning good interventions via causal inference. In Advances in Neural Information Processing Systems, volume 29, 2016.
- T. Lattimore and C. Szepesvári. Bandit algorithms. Cambridge University Press, 2020.
- S. Mannor and J. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5:623–648, 2004.
- Batched bandit problems. The Annals of Statistics, pages 660–681, 2016.
- D. Russo. Simple bayesian algorithms for best arm identification. Operations Research, 68(6):1625–1647, 2020.
- D. Russo and B. Van Roy. Learning to optimize via information-directed sampling. Operations Research, 66(1):230–252, 2018.
- On sequential elimination algorithms for best-arm identification in multi-armed bandits. IEEE Transactions on Signal Processing, 65(16):4281–4292, 2017.
- Fixed-confidence guarantees for bayesian best-arm identification. In International Conference on Artificial Intelligence and Statistics, pages 1823–1832. PMLR, 2020.
- Regret bounds for gaussian-process optimization in large domains. In Advances in Neural Information Processing Systems, volume 34, pages 7385–7396, 2021.
- Revisiting simple regret: Fast rates for returning a good arm, 2023.