Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modeling Attrition in Recommender Systems with Departing Bandits (2203.13423v2)

Published 25 Mar 2022 in cs.LG, cs.IR, and stat.ML

Abstract: Traditionally, when recommender systems are formalized as multi-armed bandits, the policy of the recommender system influences the rewards accrued, but not the length of interaction. However, in real-world systems, dissatisfied users may depart (and never come back). In this work, we propose a novel multi-armed bandit setup that captures such policy-dependent horizons. Our setup consists of a finite set of user types, and multiple arms with Bernoulli payoffs. Each (user type, arm) tuple corresponds to an (unknown) reward probability. Each user's type is initially unknown and can only be inferred through their response to recommendations. Moreover, if a user is dissatisfied with their recommendation, they might depart the system. We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal. We then move forward to the more challenging case, where users are divided among two types. While naive approaches cannot handle this setting, we provide an efficient learning algorithm that achieves $\tilde{O}(\sqrt{T})$ regret, where $T$ is the number of users.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24: 2312–2320.
  2. Sequential transfer in multi-armed bandit with finite set of models. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2, 2220–2228.
  3. Fiduciary bandits. In International Conference on Machine Learning, 518–527. PMLR.
  4. Economic Recommendation Systems: One Page Abstract. In Proceedings of the 2016 ACM Conference on Economics and Computation, EC ’16, 757–757. New York, NY, USA: ACM. ISBN 978-1-4503-3936-0.
  5. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1): 1–122.
  6. Fatigue-Aware Bandits for Dependent Click Models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 3341–3348.
  7. Prediction, learning, and games. Cambridge Univ Press.
  8. Optimal Algorithm for Bayesian Incentive-Compatible. In ACM Conf. on Economics and Computation (EC).
  9. Compressed Sensing: Theory and Applications. ISBN 978-1107005587.
  10. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst.
  11. Maximum entropy distributions on graphs. arXiv:1301.3321.
  12. Multi-armed Bandit with Sub-exponential Rewards. Operations Research Letters.
  13. Fairness in learning: Classic and contextual bandits. arXiv preprint arXiv:1605.07139.
  14. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2): 99–134.
  15. Distributed Clustering of Linear Bandits in Peer to Peer Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning.
  16. Implementing the wisdom of the crowd. Journal of Political Economy, 122: 988–1012.
  17. Bandit algorithms. Cambridge University Press.
  18. Rebounding Bandits for Modeling Satiation Effects. arXiv preprint arXiv:2011.06741.
  19. Incentivizing high quality user contributions: New arm generation in bandit learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
  20. Partially Observable Markov Decision Process for Recommender Systems. CoRR, abs/1608.07793.
  21. Fast Distributed Bandits for Online Recommendation Systems.
  22. Bayesian Incentive-Compatible Bandit Exploration. In ACM Conf. on Economics and Computation (EC).
  23. Achieving fairness in the stochastic multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 5379–5386.
  24. Recovering bandits. arXiv preprint arXiv:1910.14354.
  25. Corporate Social Responsibility via Multi-Armed Bandits. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 26–40.
  26. An MDP-Based Recommender System. Journal of Machine Learning Research, 6(43): 1265–1295.
  27. Slivkins, A. 2019. Introduction to multi-armed bandits. arXiv preprint arXiv:1904.07272.
  28. Jointly Learning to Recommend and Advertise. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  29. Maximizing Cumulative User Engagement in Sequential Recommendation: An Online Optimization Perspective. In Gupta, R.; Liu, Y.; Tang, J.; and Prakash, B. A., eds., KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets