Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms (2002.10121v4)

Published 24 Feb 2020 in cs.LG and stat.ML

Abstract: We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that subsampling plays a key role in designing optimal algorithms; the conventional UCB algorithm is sub-optimal, whereas a subsampled UCB (SS-UCB), which selects $\Theta(\sqrt{T})$ arms for execution under the UCB framework, achieves rate-optimality. However, despite SS-UCB's theoretical promise of optimal regret, it empirically underperforms compared to a greedy algorithm that consistently chooses the empirically best arm. This observation extends to contextual settings through simulations with real-world data. Our findings suggest a new form of \emph{free exploration} beneficial to greedy algorithms in the many-armed context, fundamentally linked to a tail event concerning the prior distribution of arm rewards. This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature. Expanding upon these insights, we establish that the subsampled greedy approach not only achieves rate-optimality for Bernoulli bandits within the many-armed regime but also attains sublinear regret across broader distributions. Collectively, our research indicates that in the many-armed regime, practitioners might find greater value in adopting greedy algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems. 2312–2320.
  2. Analysis of thompson sampling for the multi-armed bandit problem. Conference on learning theory. 39–1.
  3. Ruin probabilities, vol. 14. World scientific Singapore.
  4. Minimax policies for adversarial and stochastic bandits.
  5. Tuning bandit algorithms in stochastic environments. International conference on algorithmic learning theory. Springer, 150–165.
  6. Finite-time analysis of the multiarmed bandit problem. Machine learning 47(2-3) 235–256.
  7. Mostly exploration-free algorithms for contextual bandits. Management Science .
  8. Bandit problems with infinitely many arms. The Annals of Statistics 2103–2116.
  9. Two-target algorithms for infinite-armed bandits with bernoulli rewards. Advances in Neural Information Processing Systems. 2184–2192.
  10. Simple regret for infinitely many armed bandits. International Conference on Machine Learning. 1133–1141.
  11. Optimal ucb adjustments for large arm sizes. arXiv preprint arXiv:1909.02229 .
  12. Quantile-regret minimisation in infinitely many-armed bandits. UAI. 425–434.
  13. Science of price experimentation at amazon. AEA 2023, NABE 2023. URL https://www.amazon.science/publications/science-of-price-experimentation-at-amazon.
  14. de Bruijn, N.G. 1981. Asymptotic Methods in Analysis. Bibliotheca mathematica, Dover Publications. URL https://books.google.com/books?id=Oqj9AgAAQBAJ.
  15. Letter recognition using holland-style adaptive classifiers. Machine learning 6(2) 161–182.
  16. Gittins, John C. 1979. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological) 41(2) 148–164.
  17. Adaptive exploration in linear contextual bandit. International Conference on Artificial Intelligence and Statistics. PMLR, 3536–3545.
  18. Be Greedy in Multi-Armed Bandits. arXiv e-prints arXiv:2101.01086.
  19. A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in Neural Information Processing Systems. 2227–2236.
  20. The true sample complexity of identifying good arms. Silvia Chiappa, Roberto Calandra, eds., Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 108. PMLR, 1781–1791.
  21. Kaufmann, Emilie. 2018. On bayesian index policies for sequential resource allocation. The Annals of Statistics 46(2) 842–865.
  22. Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press. URL https://books.google.com/books?id=Gu-CEAAAQBAJ.
  23. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6(1) 4–22.
  24. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 15(3) 1091–1114.
  25. Bandit algorithms. Cambridge University Press.
  26. The externalities of exploration and how data diversity helps exploitation. arXiv preprint arXiv:1806.00543 .
  27. Exploring k𝑘kitalic_k out of top ρ𝜌\rhoitalic_ρ fraction of arms in stochastic bandits. Kamalika Chaudhuri, Masashi Sugiyama, eds., Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 89. PMLR, 2820–2828.
  28. Learning to optimize via information-directed sampling. Advances in Neural Information Processing Systems. 1583–1591.
  29. Learning to optimize via posterior sampling. Mathematics of Operations Research 39(4) 1221–1243.
  30. An information-theoretic analysis of thompson sampling. The Journal of Machine Learning Research 17(1) 2442–2471.
  31. Satisficing in time-sensitive bandit learning. arXiv preprint arXiv:1803.02855 .
  32. Slivkins, Aleksandrs. 2019. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning 12(1-2) 1–286. 10.1561/2200000068. URL http://dx.doi.org/10.1561/2200000068.
  33. Thompson, William R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4) 285–294.
  34. Algorithms for infinitely many-armed bandits. D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, eds., Advances in Neural Information Processing Systems 21. 1729–1736.
  35. On regret with multiple best arms. H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin, eds., Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 9050–9060. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/670c26185a3783678135b4697f7dbd1a-Paper.pdf.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mohsen Bayati (31 papers)
  2. Nima Hamidi (6 papers)
  3. Ramesh Johari (41 papers)
  4. Khashayar Khosravi (9 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com