Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Forced Exploration in Bandit Problems (2312.07285v2)

Published 12 Dec 2023 in cs.LG and stat.ML

Abstract: The multi-armed bandit(MAB) is a classical sequential decision problem. Most work requires assumptions about the reward distribution (e.g., bounded), while practitioners may have difficulty obtaining information about these distributions to design models for their problems, especially in non-stationary MAB problems. This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution while still achieving substantial regret upper bounds. To this end, we propose a novel algorithm alternating between greedy rule and forced exploration. Our method can be applied to Gaussian, Bernoulli and other subgaussian distributions, and its implementation does not require additional information. We employ a unified analysis method for different forced exploration strategies and provide problem-dependent regret upper bounds for stationary and piecewise-stationary settings. Furthermore, we compare our algorithm with popular bandit algorithms on different reward distributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Optimal best-arm identification methods for tail-risk measures. Advances in Neural Information Processing Systems, 34: 25578–25590.
  2. Bandit algorithms: Letting go of logarithmic regret for statistical robustness. In International Conference on Artificial Intelligence and Statistics, 622–630. PMLR.
  3. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47: 235–256.
  4. Adaptively tracking the best bandit arm with an unknown number of distribution changes. In Conference on Learning Theory, 138–158. PMLR.
  5. Sub-sampling for multi-armed bandits. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I 14, 115–131. Springer.
  6. On limited-memory subsampling strategies for bandits. In International Conference on Machine Learning, 727–737. PMLR.
  7. Efficient change-point detection for tackling piecewise-stationary bandits. The Journal of Machine Learning Research, 23(1): 3337–3376.
  8. Bandits with heavy tail. IEEE Transactions on Information Theory, 59(11): 7711–7717.
  9. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In The 22nd International Conference on Artificial Intelligence and Statistics, 418–427. PMLR.
  10. Chan, H. P. 2020. The multi-armed bandit problem: An efficient nonparametric solution.
  11. A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. In Conference on Learning Theory, 696–726. PMLR.
  12. Unimodal bandits: Regret lower bounds and optimal algorithms. In International Conference on Machine Learning, 521–529. PMLR.
  13. On explore-then-commit strategies. Advances in Neural Information Processing Systems, 29.
  14. On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory, 174–188. Springer.
  15. Clinical resistance to STI-571 cancer therapy caused by BCR-ABL gene mutation or amplification. Science, 293(5531): 876–880.
  16. Perturbed-history exploration in stochastic multi-armed bandits. arXiv preprint arXiv:1902.10089.
  17. Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In International Conference on Machine Learning, 3601–3610. PMLR.
  18. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1): 4–22.
  19. Lattimore, T. 2017. A scale free algorithm for stochastic bandits with bounded kurtosis. Advances in Neural Information Processing Systems, 30.
  20. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining, 297–306.
  21. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 539–548.
  22. A change-detection based framework for piecewise-stationary multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  23. The Extended UCB Policies for Frequentist Multi-armed Bandit Problems. arXiv:1112.1768.
  24. Taming non-stationary bandits: A Bayesian approach. arXiv preprint arXiv:1707.09727.
  25. Bandit algorithms based on thompson sampling for bounded reward distributions. In Algorithmic Learning Theory, 777–826. PMLR.
  26. Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4): 500–522.
  27. Slivkins, A.; et al. 2019. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2): 1–286.
  28. Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4): 285–294.
  29. Sliding-window thompson sampling for non-stationary settings. Journal of Artificial Intelligence Research, 68: 311–364.
  30. Multi-armed bandit algorithms and empirical evaluation. In Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings 16, 437–448. Springer.
  31. Residual bootstrap exploration for bandit algorithms. arXiv preprint arXiv:2002.08436.
  32. Learning contextual bandits in a non-stationary environment. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 495–504.

Summary

We haven't generated a summary for this paper yet.