Adaptive Generalized Neyman Allocation: Local Asymptotic Minimax Optimal Best Arm Identification (2405.19317v1)
Abstract: This study investigates a local asymptotic minimax optimal strategy for fixed-budget best arm identification (BAI). We propose the Adaptive Generalized Neyman Allocation (AGNA) strategy and show that its worst-case upper bound of the probability of misidentifying the best arm aligns with the worst-case lower bound under the small-gap regime, where the gap between the expected outcomes of the best and suboptimal arms is small. Our strategy corresponds to a generalization of the Neyman allocation for two-armed bandits (Neyman, 1934; Kaufmann et al., 2016) and a refinement of existing strategies such as the ones proposed by Glynn & Juneja (2004) and Shin et al. (2018). Compared to Komiyama et al. (2022), which proposes a minimax rate-optimal strategy, our proposed strategy has a tighter upper bound that exactly matches the lower bound, including the constant terms, by restricting the class of distributions to the class of small-gap distributions. Our result contributes to the longstanding open issue about the existence of asymptotically optimal strategies in fixed-budget BAI, by presenting the local asymptotic minimax optimal strategy.
- Policy choice and best arm identification: Asymptotic analysis of exploration sampling, 2021. arXiv:2109.08229.
- Bayesian fixed-budget best-arm identification, 2023. arXiv:2211.08572.
- Best arm identification in multi-armed bandits. In Conference on Learning Theory, pp. 41–53, 2010.
- R. R. Bahadur. Stochastic Comparison of Tests. The Annals of Mathematical Statistics, 31(2):276 – 295, 1960.
- Sequential nonparametric testing with the law of the iterated logarithm. In Alexander T. Ihler and Dominik Janzing (eds.), Conference on Uncertainty in Artificial Intelligence, 2016.
- On best-arm identification with a fixed budget in non-parametric multi-armed bandits. In International Conference on Algorithmic Learning Theory (AISTATS), 2023.
- Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory, pp. 23–37. Springer Berlin Heidelberg, 2009.
- Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 2011.
- Geometric Modeling in Probability and Statistics. Mathematics and Statistics. Springer International Publishing, 2014.
- Tight (lower) bounds for the fixed budget best arm identification bandit problem. In COLT, 2016.
- Simulation budget allocation for further enhancing theefficiency of ordinal optimization. Discrete Event Dynamic Systems, 10(3):251–270, 2000.
- Semiparametric efficient inference in adaptive experiments. In NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World, 2023. a]rXiv:2311.18274.
- Rémy Degenne. On the existence of a complexity in fixed budget bandit identification. In Conference on Learning Theory, volume 195, pp. 1131–1154. PMLR, 2023.
- The method of cumulants for the normal approximation. Probability Surveys, 19(none):185 – 270, 2022.
- John Duchi. Lecture notes on statistics and information theory, 2023. URL https://web.stanford.edu/class/stats311/lecture-notes.pdf.
- Richard S. Ellis. Large Deviations for a General Class of Random Vectors. The Annals of Probability, 12(1):1 – 12, 1984.
- Optimal best arm identification with fixed confidence. In Conference on Learning Theory, 2016.
- Jürgen Gärtner. On large deviations from the invariant measure. Theory of Probability & Its Applications, 22(1):24–39, 1977.
- A large deviations perspective on ordinal optimization. In Proceedings of the 2004 Winter Simulation Conference, volume 1. IEEE, 2004.
- Adaptive experimental design using the propensity score. Journal of Business and Economic Statistics, 2011.
- Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 2003.
- Time-uniform, nonparametric, nonasymptotic confidence sequences. Annals of Statistics, 2021.
- Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, 2013.
- Adaptive treatment assignment in experiments for policy choice. Econometrica, 89(1):113–132, 2021.
- Masahiro Kato. Locally optimal fixed-budget best arm identification in two-armed gaussian bandits with unknown variances, 2024a. arXiv:2312.12741.
- Masahiro Kato. Worst-case optimal multi-armed gaussian best arm identification with a fixed budget, 2024b. arXiv:2310.19788.
- Efficient adaptive experimental design for average treatment effect estimation, 2020. arXiv:2002.05308.
- Emilie Kaufmann. Contributions to the Optimal Solution of Several Bandits Problems. Habilitation á Diriger des Recherches, Université de Lille, 2020. URL https://emiliekaufmann.github.io/HDR_EmilieKaufmann.pdf.
- On the complexity of best-arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17(1):1–42, 2016.
- Minimax optimal algorithms for fixed-budget best arm identification. In Advances in Neural Information Processing Systems, 2022.
- Rate-optimal bayesian simple regret in best arm identification. Mathematics of Operations Research, 2023.
- Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 1985.
- Fixed-budget best-arm identification with heterogeneous reward variances. In Conference on Uncertainty in Artificial Intelligence, 2023.
- L Le Cam. Limits of experiments. In Theory of Statistics, pp. 245–282. University of California Press, 1972.
- Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics). Springer, 1986.
- Theory of Point Estimation. Springer-Verlag, 1998.
- Jerzy Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Statistical Science, 5:463–472, 1923.
- Jerzy Neyman. On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97:123–150, 1934.
- Prior-dependent allocations for bayesian fixed-budget best-arm identification in structured bandits, 2024. arXiv:2402.05878.
- Chao Qin. Open problem: Optimal best arm identification with fixed-budget. In Conference on Learning Theory, 2022.
- Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 1974.
- Tractable sampling strategies for ordinal optimization. Operations Research, 66(6):1693–1712, 2018.
- Mark J. van der Laan. The construction and analysis of adaptive group sequential designs, 2008. URL https://biostats.bepress.com/ucbbiostat/paper232.
- A.W. van der Vaart. An asymptotic representation theorem. International Statistical Review / Revue Internationale de Statistique, 59(1):97–121, 1991.
- A.W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998.
- A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2):117–186, 1945.
- On uniformly optimal algorithms for best arm identification in two-armed bandits with fixed budget. In International Conference on Machine Learning (ICML), 2024.
- Minimax optimal fixed-budget best arm identification in linear bandits. In Advances in Neural Information Processing Systems, 2022.
- Masahiro Kato (50 papers)