Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances (2312.12741v2)

Published 20 Dec 2023 in cs.LG, econ.EM, math.ST, stat.ME, stat.ML, and stat.TH

Abstract: We address the problem of best arm identification (BAI) with a fixed budget for two-armed Gaussian bandits. In BAI, given multiple arms, we aim to find the best arm, an arm with the highest expected reward, through an adaptive experiment. Kaufmann et al. (2016) develops a lower bound for the probability of misidentifying the best arm. They also propose a strategy, assuming that the variances of rewards are known, and show that it is asymptotically optimal in the sense that its probability of misidentification matches the lower bound as the budget approaches infinity. However, an asymptotically optimal strategy is unknown when the variances are unknown. For this open issue, we propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations. We refer to this strategy as the Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW) strategy. We then demonstrate that this strategy is asymptotically optimal by showing that its probability of misidentification matches the lower bound when the budget approaches infinity, and the gap between the expected rewards of two arms approaches zero (small-gap regime). Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is asymptotically optimal even when the variances are unknown.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Karun Adusumilli. Neyman allocation is minimax optimal for best arm identification with two arms, 2022. arXiv:2204.05527.
  2. Online ordinal optimization under model misspecification, 2021. URL https://api.semanticscholar.org/CorpusID:235389954. SSRN.
  3. Local bahadur efficiency of score tests. Journal of Statistical Planning and Inference, 19(2):187–199, 1988.
  4. Policy choice and best arm identification: Asymptotic analysis of exploration sampling, 2021. arXiv:2109.08229.
  5. Timothy B. Armstrong. Asymptotic efficiency bounds for a class of experimental designs, 2022. arXiv:2205.02726.
  6. Bayesian fixed-budget best-arm identification, 2023. arXiv:2211.08572.
  7. Best arm identification in multi-armed bandits. In Conference on Learning Theory, pp.  41–53, 2010.
  8. R. R. Bahadur. Stochastic Comparison of Tests. The Annals of Mathematical Statistics, 31(2):276 – 295, 1960.
  9. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
  10. Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory, pp.  23–37. Springer Berlin Heidelberg, 2009.
  11. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 2011.
  12. Tight (lower) bounds for the fixed budget best arm identification bandit problem. In COLT, 2016.
  13. Simulation budget allocation for further enhancing theefficiency of ordinal optimization. Discrete Event Dynamic Systems, 10(3):251–270, 2000.
  14. Rémy Degenne. On the existence of a complexity in fixed budget bandit identification. In Conference on Learning Theory, volume 195, pp. 1131–1154. PMLR, 2023.
  15. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, 2016.
  16. A large deviations perspective on ordinal optimization. In Proceedings of the 2004 Winter Simulation Conference, volume 1. IEEE, 2004.
  17. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences, 118(15), 2021.
  18. Jinyong Hahn. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66(2):315–331, 1998.
  19. Adaptive experimental design using the propensity score. Journal of Business and Economic Statistics, 2011.
  20. Bahadur efficiency and robustness of studentized score tests. Annals of the Institute of Statistical Mathematics, 48(2):295–314, Jun 1996.
  21. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 2003.
  22. lil’ ucb : An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, 2014.
  23. Dealing with unknown variances in best-arm identification. In Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201, pp.  776–849, 2023.
  24. Adaptive treatment assignment in experiments for policy choice. Econometrica, 89(1):113–132, 2021.
  25. Masahiro Kato. Worst-case optimal multi-armed gaussian best arm identification with a fixed budget, 2023. arXiv:2310.19788.
  26. Efficient adaptive experimental design for average treatment effect estimation, 2020. arXiv:2002.05308.
  27. Asymptotically minimax optimal fixed-budget best arm identification for expected simple regret minimization, 2023a. arXiv:2302.02988.
  28. Fixed-budget hypothesis best arm identification: On the information loss in experimental design. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023b.
  29. Emilie Kaufmann. Contributions to the Optimal Solution of Several Bandits Problems. Habilitation á Diriger des Recherches, Université de Lille, 2020. URL https://emiliekaufmann.github.io/HDR_EmilieKaufmann.pdf.
  30. On the complexity of best-arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17(1):1–42, 2016.
  31. Optimal simple regret in bayesian best arm identification, 2021.
  32. Minimax optimal algorithms for fixed-budget best arm identification. In Advances in Neural Information Processing Systems, 2022.
  33. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 1985.
  34. Jerzy Neyman. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Statistical Science, 5:463–472, 1923.
  35. Jerzy Neyman. On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97:123–150, 1934.
  36. Chao Qin. Open problem: Optimal best arm identification with fixed-budget. In Conference on Learning Theory, 2022.
  37. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427):846–866, 1994.
  38. Daniel Russo. Simple bayesian algorithms for best-arm identification. Operations Research, 68(6):1625–1647, 2020.
  39. Max Tabord-Meehan. Stratification trees for adaptive randomization in randomized controlled trials, 2018.
  40. Anastasios Tsiatis. Semiparametric Theory and Missing Data. Springer Series in Statistics. Springer New York, 2007.
  41. Mark J. van der Laan. The construction and analysis of adaptive group sequential designs, 2008. URL https://biostats.bepress.com/ucbbiostat/paper232.
  42. A.W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998.
  43. On uniformly optimal algorithms for best arm identification in two-armed bandits with fixed budget, 2023a. arXiv:2308.12000.
  44. Best arm identification with fixed budget: A large deviation perspective. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b. URL https://openreview.net/forum?id=gYetLsNO8x.
  45. Harry S. Wieand. A Condition Under Which the Pitman and Bahadur Approaches to Efficiency Coincide. The Annals of Statistics, 4(5):1003 – 1011, 1976.
  46. Jinglong Zhao. Adaptive neyman allocation, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com