Covariance Adaptive Best Arm Identification (2306.02630v2)
Abstract: We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input $\delta$, the goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $\delta$, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the assumption of independent arms distributions, we propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously. This framework allows the learner to estimate the covariance among the arms distributions, enabling a more efficient identification of the best arm. The relaxed setting we propose is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations in the outcomes. We introduce new algorithms that adapt to the unknown covariance of the arms and demonstrate through theoretical guarantees that substantial improvement can be achieved over the standard setting. Additionally, we provide new lower bounds for the relaxed setting and present numerical simulations that support their theoretical findings.
- Best arm identification in multi-armed bandits. In Conference on Learning Theory, 2010.
- Regret in online combinatorial optimization. Mathematics of Operations Research, 39(1):31–45, 2014.
- Tuning bandit algorithms in stochastic environments. In International conference on algorithmic learning theory, pages 150–165. Springer, 2007.
- Sequential nonparametric testing with the law of the iterated logarithm. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, pages 42–51, 2016.
- Pierre C Bellec. Concentration of quadratic forms under a Bernstein moment assumption. arXiv preprint arXiv:1901.08736, 2019.
- F-race and iterated F-race: An overview. Experimental methods for the analysis of optimization algorithms, pages 311–336, 2010.
- Leveraging side observations in stochastic bandits. In UAI, 2012.
- Tight (lower) bounds for the fixed budget best arm identification bandit problem. In Conference on Learning Theory, pages 590–604. PMLR, 2016.
- Combinatorial bandits. Journal of Computer and System Sciences, 78(5):1404–1422, 2012.
- Combinatorial multi-armed bandit: General framework and applications. In International conference on machine learning, pages 151–159. PMLR, 2013.
- PAC bounds for multi-armed bandit and Markov decision processes. In International Conference on Computational Learning Theory, pages 255–270. Springer, 2002.
- Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
- Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 20(5):1466–1478, 2012.
- Optimal best arm identification with fixed confidence. In Conference on Learning Theory, pages 998–1027. PMLR, 2016.
- Best-arm identification in correlated multi-armed bandits. IEEE Journal on Selected Areas in Information Theory, 2(2):549–563, 2021.
- Lil’ucb: An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, pages 423–439. PMLR, 2014.
- Dealing with unknown variances in best-arm identification. In International Conference on Algorithmic Learning Theory, pages 776–849. PMLR, 2023.
- Improper analysis of trials randomised using stratified blocks or minimisation. Statistics in medicine, 31(4):328–340, 2012.
- PAC subset selection in stochastic multi-armed bandits. In ICML, volume 12, pages 655–662, 2012.
- On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
- Most correlated arms identification. In Conference on Learning Theory, pages 623–637. PMLR, 2014.
- The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5(Jun):623–648, 2004.
- Hoeffding races: Accelerating model selection search for classification and function approximation. Advances in neural information processing systems, 6, 1993.
- Empirical Bernstein bounds and sample-variance penalization. In COLT 2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009.
- Empirical Bernstein stopping. In Proceedings of the 25th international conference on Machine learning, pages 672–679, 2008.
- Efficient algorithms for minimizing cross validation error. In Machine Learning Proceedings 1994, pages 190–198. Elsevier, 1994.
- Covariance-adapting algorithm for semi-bandits with application to sparse outcomes. In Conference on Learning Theory, pages 3152–3184. PMLR, 2020.
- Fast rates for prediction with limited expert advice. Advances in Neural Information Processing Systems, 34, 2021.
- Alexandre B Tsybakov. Optimal rates of aggregation. In Learning theory and kernel machines, pages 303–313. Springer, 2003.
- Nicolas Verzelen. Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electronic Journal of Statistics, 6:38 – 90, 2012.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- El Mehdi Saad (8 papers)
- Gilles Blanchard (48 papers)
- Nicolas Verzelen (40 papers)