Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HELLINGER-UCB: A novel algorithm for stochastic multi-armed bandit problem and cold start problem in recommender system (2404.10207v1)

Published 16 Apr 2024 in stat.ML and cs.LG

Abstract: In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Hellinger-UCB has a solid statistical interpretation. We show that Hellinger-UCB is effective in finite time horizons with numerical experiments between Hellinger-UCB and other variants of the UCB algorithm. As a real-world example, we apply the Hellinger-UCB algorithm to solve the cold-start problem for a content recommender system of a financial app. With reasonable assumption, the Hellinger-UCB algorithm has a convenient but important lower latency feature. The online experiment also illustrates that the Hellinger-UCB outperforms both KL-UCB and UCB1 in the sense of a higher click-through rate (CTR).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “Regression-based latent factor models” In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, pp. 19–28
  2. Rajeev Agrawal “Sample mean based index policies by o (log n) regret for the multi-armed bandit problem” In Advances in applied probability 27.4 Cambridge University Press, 1995, pp. 1054–1078
  3. “Analysis of thompson sampling for the multi-armed bandit problem” In Conference on learning theory, 2012, pp. 39–1 JMLR WorkshopConference Proceedings
  4. “Minimax Policies for Adversarial and Stochastic Bandits.” In COLT 7, 2009, pp. 1–122
  5. Peter Auer, Nicolo Cesa-Bianchi and Paul Fischer “Finite-time analysis of the multiarmed bandit problem” In Machine learning 47 Springer, 2002, pp. 235–256
  6. Omar Besbes, Yonatan Gur and Assaf Zeevi “Stochastic multi-armed-bandit problem with non-stationary rewards” In Advances in neural information processing systems 27, 2014
  7. Apostolos N Burnetas and Michael N Katehakis “Optimal adaptive policies for sequential allocation problems” In Advances in Applied Mathematics 17.2 Elsevier, 1996, pp. 122–142
  8. “Kullback-Leibler upper confidence bounds for optimal sequential allocation” In The Annals of Statistics JSTOR, 2013, pp. 1516–1541
  9. “A multi-armed bandit model selection for cold-start user recommendation” In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, 2017, pp. 32–40
  10. “The KL-UCB algorithm for bounded stochastic bandits and beyond” In Proceedings of the 24th annual conference on learning theory, 2011, pp. 359–376 JMLR WorkshopConference Proceedings
  11. “KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints” In J. Mach. Learn. Res. 23.1 JMLR.org, 2022
  12. “Tied boltzmann machines for cold start recommendations” In Proceedings of the 2008 ACM conference on Recommender systems, 2008, pp. 19–26
  13. “Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards.” In J. Mach. Learn. Res. 16, 2015, pp. 3721–3756
  14. “PAC subset selection in stochastic multi-armed bandits.” In ICML 12, 2012, pp. 655–662
  15. Tze Leung Lai and Herbert Robbins “Asymptotically efficient adaptive allocation rules” In Advances in applied mathematics 6.1 Academic Press, 1985, pp. 4–22
  16. Tor Lattimore “Refining the confidence level for optimistic bandit strategies” In The Journal of Machine Learning Research 19.1 JMLR. org, 2018, pp. 765–796
  17. “Testing Statistical Hypotheses”, Springer Texts in Statistics Springer New York, 2006 URL: https://books.google.com/books?id=K6t5qn-SEp8C
  18. “A contextual-bandit approach to personalized news article recommendation” In Proceedings of the 19th international conference on World wide web, 2010, pp. 661–670
  19. “Stochastic multi-armed bandits in constant space” In International Conference on Artificial Intelligence and Statistics, 2018, pp. 386–394 PMLR
  20. “A minimax and asymptotically optimal algorithm for stochastic bandits” In International Conference on Algorithmic Learning Theory, 2017, pp. 223–237 PMLR
  21. “Pairwise preference regression for cold-start recommendation” In Proceedings of the third ACM conference on Recommender systems, 2009, pp. 21–28
  22. Lijing Qin, Shouyuan Chen and Xiaoyan Zhu “Contextual combinatorial bandit and its application on diversified online recommendation” In Proceedings of the 2014 SIAM International Conference on Data Mining, 2014, pp. 461–469 SIAM
  23. Herbert E. Robbins “Some aspects of the sequential design of experiments” In Bulletin of the American Mathematical Society 58, 1952, pp. 527–535 URL: https://api.semanticscholar.org/CorpusID:15556973
  24. Daniel Russo and Benjamin Van Roy “Learning to optimize via posterior sampling” In Mathematics of Operations Research 39.4 INFORMS, 2014, pp. 1221–1243
  25. William R Thompson “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples” In Biometrika 25.3-4 Oxford University Press, 1933, pp. 285–294

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com