Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incentivized Exploration of Non-Stationary Stochastic Bandits (2403.10819v1)

Published 16 Mar 2024 in cs.LG, cs.AI, and stat.ML

Abstract: We study incentivized exploration for the multi-armed bandit (MAB) problem with non-stationary reward distributions, where players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on the reward. We consider two different non-stationary environments: abruptly-changing and continuously-changing, and propose respective incentivized exploration algorithms. We show that the proposed algorithms achieve sublinear regret and compensation over time, thus effectively incentivizing exploration despite the nonstationarity and the biased or drifted feedback.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. J. C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available: http://www.jstor.org/stable/2985029
  2. W. H. Press, “From the Cover: Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research,” Proceedings of the National Academy of Science, vol. 106, no. 52, pp. 22 387–22 392, Dec. 2009.
  3. E. Brochu, M. W. Hoffman, and N. de Freitas, “Portfolio Allocation for Bayesian Optimization,” arXiv e-prints, p. arXiv:1009.5419, Sep. 2010.
  4. D. Bouneffouf, A. Bouzeghoub, and A. L. Gançarski, “A contextual-bandit algorithm for mobile context-aware recommender system,” in ICONIP, 2012.
  5. L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” Proceedings of the 19th international conference on World wide web - WWW ’10, 2010. [Online]. Available: http://dx.doi.org/10.1145/1772690.1772758
  6. F. Radlinski, R. Kleinberg, and T. Joachims, “Learning diverse rankings with multi-armed bandits,” in Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08.   New York, NY, USA: Association for Computing Machinery, 2008, p. 784–791. [Online]. Available: https://doi.org/10.1145/1390156.1390255
  7. Y. Gai, B. Krishnamachari, and R. Jain, “Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation,” 05 2010, pp. 1 – 9.
  8. S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” 2012.
  9. P. Frazier, D. Kempe, J. Kleinberg, and R. Kleinberg, “Incentivizing exploration,” in Proceedings of the Fifteenth ACM Conference on Economics and Computation, ser. EC ’14.   New York, NY, USA: Association for Computing Machinery, 2014, p. 5–22. [Online]. Available: https://doi.org/10.1145/2600057.2602897
  10. Y. Mansour, A. Slivkins, and V. Syrgkanis, “Bayesian incentive-compatible bandit exploration,” in Proceedings of the Sixteenth ACM Conference on Economics and Computation, ser. EC ’15.   New York, NY, USA: Association for Computing Machinery, 2015, p. 565–582. [Online]. Available: https://doi.org/10.1145/2764468.2764508
  11. S. Wang and L. Huang, “Multi-armed bandits with compensation,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31.   Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper/2018/file/8bdb5058376143fa358981954e7626b8-Paper.pdf
  12. N. Immorlica, J. Mao, A. Slivkins, and S. Wu, “Bayesian exploration with heterogeneous agents,” in The Web Conference 2019, May 2019. [Online]. Available: https://www.microsoft.com/en-us/research/publication/bayesian-exploration-with-heterogeneous-agents/
  13. C. Hirnschall, A. Singla, S. Tschiatschek, and A. Krause, “Learning user preferences to incentivize exploration in the sharing economy,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/11874
  14. L. Han, D. Kempe, and R. Qiang, “Incentivizing exploration with heterogeneous value of money,” Lecture Notes in Computer Science, p. 370–383, 2015. [Online]. Available: http://dx.doi.org/10.1007/978-3-662-48995-6_27
  15. Y. Liu and C.-J. Ho, “Incentivizing high quality user contributions: New arm generation in bandit learning,” 2018. [Online]. Available: https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16879
  16. A. Martensen, L. Gronholdt, and K. Kristensen, “The drivers of customer satisfaction and loyalty: Cross-industry findings from denmark,” Total Quality Management, vol. 11, no. 4-6, pp. 544–553, 2000. [Online]. Available: https://doi.org/10.1080/09544120050007878
  17. Z. Ehsani and M. Ehsani, “Effect of quality and price on customer satisfaction and commitment in iran auto industry,” 2015.
  18. L. Gwo-Guang and L. Hsiu-Fen, “Consumer perceptions of e-service quality in online shopping,” International Journal of Retail & Distribution Management, vol. 33, pp. 161–176, 02 2005.
  19. Z. Liu, H. Wang, F. Shen, K. Liu, and L. Chen, “Incentivized exploration for multi-armed bandits under reward drift,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 4981–4988, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5937
  20. I. Kremer, Y. Mansour, and M. Perry, “Implementing the wisdom of the crowd,” vol. 122, 06 2013, pp. 605–606.
  21. Y.-K. Che and J. Hörner, “Recommender systems as mechanisms for social learning*,” Quarterly Journal of Economics, vol. 133, pp. 871–925, 05 2018.
  22. Y. Mansour, A. Slivkins, and V. Syrgkanis, “Bayesian incentive-compatible bandit exploration,” 2019.
  23. L. Cohen and Y. Mansour, “Optimal algorithm for bayesian incentive-compatible exploration,” in Proceedings of the 2019 ACM Conference on Economics and Computation, ser. EC ’19.   New York, NY, USA: Association for Computing Machinery, 2019, p. 135–151. [Online]. Available: https://doi.org/10.1145/3328526.3329581
  24. M. Sellke and A. Slivkins, “The price of incentivizing exploration: A characterization via thompson sampling and sample complexity,” 2021.
  25. E. Kamenica and M. Gentzkow, “Bayesian persuasion,” American Economic Review, vol. 101, no. 6, pp. 2590–2615, October 2011. [Online]. Available: https://www.aeaweb.org/articles?id=10.1257/aer.101.6.2590
  26. A. Slivkins, “Introduction to multi-armed bandits,” 2021.
  27. P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, pp. 235–256, 2002.
  28. T. L. Lai, H. Robbins et al., “Asymptotically efficient adaptive allocation rules,” Advances in applied mathematics, vol. 6, no. 1, pp. 4–22, 1985.
  29. D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, Z. Wen et al., “A tutorial on thompson sampling,” Foundations and Trends® in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018.
  30. L. Kocsis and C. Szepesvari, “Discounted ucb pascal challenges workshop,” Venice, Italy (April 2006), 2006.
  31. A. Garivier and E. Moulines, “On upper-confidence bound policies for non-stationary bandit problems,” 2008.
  32. O. Besbes, Y. Gur, and A. Zeevi, “Stochastic multi-armed-bandit problem with non-stationary rewards,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., vol. 27.   Curran Associates, Inc., 2014. [Online]. Available: https://proceedings.neurips.cc/paper/2014/file/903ce9225fca3e988c2af215d4e544d3-Paper.pdf
  33. C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud, and M. Sebag, “Multi-armed bandit, dynamic environments and meta-bandits,” 2006.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com