Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Experimental Design for Policy Learning (2401.03756v3)

Published 8 Jan 2024 in cs.LG, cs.AI, econ.EM, stat.ME, and stat.ML

Abstract: Evidence-based targeting has been a topic of growing interest among the practitioners of policy and business. Formulating decision-maker's policy learning as a fixed-budget best arm identification (BAI) problem with contextual information, we study an optimal adaptive experimental design for policy learning with multiple treatment arms. In the sampling stage, the planner assigns treatment arms adaptively over sequentially arriving experimental units upon observing their contextual information (covariates). After the experiment, the planner recommends an individualized assignment rule to the population. Setting the worst-case expected regret as the performance criterion of adaptive sampling and recommended policies, we derive its asymptotic lower bounds, and propose a strategy, Adaptive Sampling-Policy Learning strategy (PLAS), whose leading factor of the regret upper bound aligns with the lower bound as the size of experimental units increases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Adusumilli, K. (2021): “Risk and optimal policies in bandit experiments,” arXiv:2112.06363.
  2.    (2022): “Minimax policies for best arm identification with two arms,” arXiv:2204.05527.
  3. Ariu, K., M. Kato, J. Komiyama, K. McAlinn, and C. Qin (2021): “Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling,” arXiv:2109.08229.
  4. Armstrong, T. B. (2022): “Asymptotic Efficiency Bounds for a Class of Experimental Designs,” arXiv:2205.02726.
  5. Athey, S., and S. Wager (2021): “Policy Learning With Observational Data,” Econometrica, 89(1), 133–161.
  6. Atsidakou, A., S. Katariya, S. Sanghavi, and B. Kveton (2023): “Bayesian Fixed-Budget Best-Arm Identification,” arXiv:2211.08572.
  7. Audibert, J.-Y., S. Bubeck, and R. Munos (2010): “Best Arm Identification in Multi-Armed Bandits,” in Conference on Learning Theory, pp. 41–53.
  8. Bang, H., and J. M. Robins (2005): “Doubly Robust Estimation in Missing Data and Causal Inference Models,” Biometrics, 61(4), 962–973.
  9. Bibaut, A., M. Dimakopoulou, N. Kallus, A. Chambaz, and M. van der Laan (2021): “Post-Contextual-Bandit Inference,” in Advances in Neural Information Processing Systems (NeurIPS).
  10. Bubeck, S., R. Munos, and G. Stoltz (2009): “Pure Exploration in Multi-armed Bandits Problems,” in Algorithmic Learning Theory (ALT).
  11.    (2011): “Pure exploration in finitely-armed and continuous-armed bandits,” Theoretical Computer Science.
  12. Carpentier, A., and A. Locatelli (2016): “Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem,” in Conference on Learning Theory (COLT).
  13. Chen, C.-H., J. Lin, E. Yücesan, and S. E. Chick (2000): “Simulation Budget Allocation for Further Enhancing TheEfficiency of Ordinal Optimization,” Discrete Event Dynamic Systems, 10(3), 251–270.
  14. Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018): “Double/debiased machine learning for treatment and structural parameters,” The Econometrics Journal.
  15. Dehejia, R. H. (2005): “Program evaluation as a decision problem,” Journal of Econometrics, 125(1), 141–173.
  16. Deshmukh, A. A., S. Sharma, J. W. Cutler, M. Moldwin, and C. Scott (2018): “Simple Regret Minimization for Contextual Bandits,” arXiv:1810.07371.
  17. Dominitz, J., and C. F. Manski (2022): “Minimax-regret sample design in anticipation of missing data, with application to panel data,” Journal of Econometrics, 226(1), 104–114, Annals Issue in Honor of Gary Chamberlain.
  18. Dominitz, J., and F. C. Manski (2017): “More Data or Better Data? A Statistical Decision Problem,” The Review of Economic Studies, 84(4), 1583–1605.
  19. Dongruo Zhou, Lihong Li, Q. G. (2020): “Neural Contextual Bandits with UCB-based Exploration,” in International Conference on Machine Learning (ICML).
  20. Even-Dar, E., S. Mannor, Y. Mansour, and S. Mahadevan (2006): “Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems.,” Journal of machine learning research.
  21. Foster, D., D. J. Foster, N. Golowich, and A. Rakhlin (2023): “On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring,” in Conference on Learning Theory (COLT).
  22. Garivier, A., and E. Kaufmann (2016): “Optimal Best Arm Identification with Fixed Confidence,” in Conference on Learning Theory.
  23. Glynn, P., and S. Juneja (2004): “A large deviations perspective on ordinal optimization,” in Proceedings of the 2004 Winter Simulation Conference, vol. 1. IEEE.
  24. Guan, M., and H. Jiang (2018): “Nonparametric Stochastic Contextual Bandits,” AAAI Conference on Artificial Intelligence.
  25. Gupta, S., Z. C. Lipton, and D. Childers (2021): “Efficient Online Estimation of Causal Effects by Deciding What to Observe,” in Advances in Neural Information Processing Systems (NeurIPS).
  26. Hadad, V., D. A. Hirshberg, R. Zhan, S. Wager, and S. Athey (2021): “Confidence intervals for policy evaluation in adaptive experiments,” Proceedings of the National Academy of Sciences, 118(15).
  27. Hahn, J., K. Hirano, and D. Karlan (2011): “Adaptive experimental design using the propensity score,” Journal of Business and Economic Statistics.
  28. Haussler, D. (1995): “Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension,” Journal of Combinatorial Theory, Series A, 69(2), 217–232.
  29. Hirano, K., and J. R. Porter (2009): “Asymptotics for Statistical Treatment Rules,” Econometrica, 77(5), 1683–1701.
  30. Ito, S., T. Tsuchiya, and J. Honda (2022): “Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds,” in Conference on Learning Theory.
  31. Jin, Y. (2023): “Upper bounds on the Natarajan dimensions of some function classes,” arXiv:2209.07015.
  32. Karlan, D., and D. H. Wood (2014): “The Effect of Effectiveness: Donor Response to Aid Effectiveness in a Direct Mail Fundraising Experiment,” Working Paper 20047, National Bureau of Economic Research.
  33. Kasy, M., and A. Sautmann (2021): “Adaptive Treatment Assignment in Experiments for Policy Choice,” Econometrica, 89(1), 113–132.
  34. Kato, M., and M. Imaizumi (2023): “Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances under a Small Gap,” Unpublised.
  35. Kato, M., M. Imaizumi, T. Ishihara, and T. Kitagawa (2022): “Best Arm Identification with Contextual Information under a Small Gap,” arXiv:2209.07330.
  36. Kato, M., T. Ishihara, J. Honda, and Y. Narita (2020): “Adaptive Experimental Design for Efficient Treatment Effect Estimation: Randomized Allocation via Contextual Bandit Algorithm,” arXiv:2002.05308.
  37. Kato, M., K. McAlinn, and S. Yasui (2021): “The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy,” in Advances in Neural Information Processing Systems (NeurIPS).
  38. Kaufmann, E., O. Cappé, and A. Garivier (2016): “On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models,” Journal of Machine Learning Research, 17(1), 1–42.
  39. Kim, W., G.-S. Kim, and M. C. Paik (2021): “Doubly Robust Thompson Sampling with Linear Payoffs,” in Advances in Neural Information Processing Systems (NeurIPS).
  40. Kitagawa, T., and A. Tetenov (2018): “Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice,” Econometrica, 86(2), 591–616.
  41. Kock, A. B., D. Preinerstorfer, and B. Veliyev (2023): “Treatment recommendation with distributional targets,” Journal of Econometrics, 234(2), 624–646.
  42. Komiyama, J., K. Ariu, M. Kato, and C. Qin (2023): “Rate-Optimal Bayesian Simple Regret in Best Arm Identification,” Mathematics of Operations Research.
  43. Lai, T., and H. Robbins (1985): “Asymptotically efficient adaptive allocation rules,” Advances in Applied Mathematics.
  44.    (1972): “Limits of experiments,” in Theory of Statistics, pp. 245–282. University of California Press.
  45. Le Cam, L. (1986): Asymptotic Methods in Statistical Decision Theory (Springer Series in Statistics). Springer.
  46. Manski, C. (2000): “Identification problems and decisions under ambiguity: Empirical analysis of treatment response and normative analysis of treatment choice,” Journal of Econometrics, 95(2), 415–442.
  47. Manski, C. F. (2002): “Treatment choice under ambiguity induced by inferential problems,” Journal of Statistical Planning and Inference, 105(1), 67–82.
  48.    (2004): “Statistical Treatment Rules for Heterogeneous Populations,” Econometrica, 72(4), 1221–1246.
  49. Manski, C. F., and A. Tetenov (2016): “Sufficient trial size to inform clinical practice,” Proceedings of the National Academy of Sciences, 113(38), 10518–10523.
  50. Natarajan, B. K. (1989): “On learning sets and functions,” Machine Learning, 4(1), 67–97.
  51. Neyman, J. (1923): “Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes,” Statistical Science, 5, 463–472.
  52.    (1934): “On the Two Different Aspects of the Representative Method: the Method of Stratified Sampling and the Method of Purposive Selection,” Journal of the Royal Statistical Society, 97, 123–150.
  53. Qian, W., and Y. Yang (2016): “Kernel Estimation and Model Combination in A Bandit Problem with Covariates,” Journal of Machine Learning Research.
  54. Qin, C., D. Klabjan, and D. Russo (2017): “Improving the Expected Improvement Algorithm,” in Advances in Neural Information Processing Systems (NeurIPS).
  55. Rakhlin, A., K. Sridharan, and A. Tewari (2015): “Sequential complexities and uniform martingale laws of large numbers,” Probability Theory and Related Fields, 161(1), 111–153.
  56. Robbins, H. (1952): “Some aspects of the sequential design of experiments,” Bulletin of the American Mathematical Society.
  57. Rubin, D. B. (1974): “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology.
  58. Russo, D. (2016): “Simple Bayesian Algorithms for Best Arm Identification,” arXiv:1602.08448.
  59. Schlag, K. H. (2007): “Eleven% Designing Randomized Experiments under Minimax Regret,” Unpublished manuscript, European University Institute, Florence.
  60. Shang, X., R. de Heide, P. Menard, E. Kaufmann, and M. Valko (2020): “Fixed-confidence guarantees for Bayesian best-arm identification,” in International Conference on Artificial Intelligence and Statistics, vol. 108, pp. 1823–1832.
  61. Stoye, J. (2009): “Minimax regret treatment choice with finite samples,” Journal of Econometrics, 151(1), 70–81.
  62.    (2012): “Minimax regret treatment choice with covariates or with limited validity of experiments,” Journal of Econometrics, 166(1), 138–156.
  63. Swaminathan, A., and T. Joachims (2015): “Counterfactual Risk Minimization,” in Proceedings of the 24th International Conference on World Wide Web, p. 939–941. Association for Computing Machinery.
  64. Tabord-Meehan, M. (2022): “Stratification Trees for Adaptive Randomization in Randomized Controlled Trials,” The Review of Economic Studies.
  65. Tekin, C., and M. van der Schaar (2015): “RELEAF: An Algorithm for Learning and Exploiting Relevance,” IEEE Journal of Selected Topics in Signal Processing.
  66. Thompson, W. R. (1933): “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika.
  67. van der Laan, M. J. (2008): “The Construction and Analysis of Adaptive Group Sequential Designs,” https://biostats.bepress.com/ucbbiostat/paper232.
  68. van der Vaart, A. (1991): “An Asymptotic Representation Theorem,” International Statistical Review / Revue Internationale de Statistique, 59(1), 97–121.
  69.    (1998): Asymptotic Statistics, Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
  70. Wager, S., and K. Xu (2021): “Diffusion Asymptotics for Sequential Experiments,” arXiv:2101.09855.
  71. Wald, A. (1949): “Statistical Decision Functions,” The Annals of Mathematical Statistics, 20(2), 165 – 205.
  72. Yang, J., and V. Tan (2022): “Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits,” in Advances in Neural Information Processing Systems (NeurIPS).
  73. Yang, Y., and D. Zhu (2002): “Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates,” Annals of Statistics, 30(1), 100–121.
  74. Zhan, R., Z. Ren, S. Athey, and Z. Zhou (2022): “Policy Learning with Adaptively Collected Data,” arXiv:2105.02344.
  75. Zheng, W., and M. J. van der Laan (2011): “Cross-Validated Targeted Minimum-Loss-Based Estimation,” in Targeted Learning: Causal Inference for Observational and Experimental Data, Springer Series in Statistics. Springer-Verlag New York.
  76. Zhou, Z., S. Athey, and S. Wager (2023): “Offline Multi-Action Policy Learning: Generalization and Optimization,” Operations Research, 71(1), 148–183.

Summary

We haven't generated a summary for this paper yet.