Ranking In Generalized Linear Bandits (2207.00109v2)
Abstract: We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.
- Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 24.
- Linear Thompson sampling revisited. In Artificial Intelligence and Statistics, 176–184. PMLR.
- Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning, 127–135. PMLR.
- An introduction to MCMC for machine learning. Machine Learning, 50(1): 5–43.
- Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2): 235–256.
- Stochastic Bandits with Side Observations on Networks. In The 2014 ACM international conference on Measurement and modeling of computer systems, 289–300.
- Click models for web search. Synthesis lectures on information concepts, retrieval, and services, 7(3).
- Introduction to Algorithms. MIT press.
- An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. In International Conference on Artificial Intelligence and Statistics, 1585–1593. PMLR.
- Learning to rank in the position based model with bandit feedback. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2405–2412.
- Parametric Bandits: The Generalized Linear Case. In Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc.
- UniRank: Unimodal Bandit Algorithms for Online Ranking. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research. PMLR.
- Gittins, J. C. 1979. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B (Methodological), 41(2): 148–164.
- Multi-Armed Bandits With Correlated Arms. IEEE Transactions on Information Theory, 67(10).
- Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits. arXiv preprint arXiv:2209.06983.
- Multiple-play bandits in the position-based model. Advances in Neural Information Processing Systems, 29.
- Bandit Algorithms. Cambridge University Press.
- Contextual Combinatorial Cascading Bandits. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research. PMLR.
- On the Prior Sensitivity of Thompson Sampling. In Algorithmic Learning Theory, 321–336. Springer International Publishing.
- Feedback graph regret bounds for Thompson Sampling and UCB. In Algorithmic Learning Theory, 592–614. PMLR.
- From bandits to experts: On the value of side-observations. Advances in Neural Information Processing Systems, 24.
- Generalized Linear Models, Second Edition. Chapman and Hall/CRC Monographs on Statistics and Applied Probability Series. Chapman & Hall.
- An analysis of approximations for maximizing submodular set functions–I. Mathematical programming, 14(1).
- Learning diverse rankings with multi-armed bandits. In International Conference on Machine Learning, 784–791.
- A Tutorial on Thompson Sampling. Foundations and Trends in Machine Learning, 11(1).
- Algorithms, 4th Edition. Addison-Wesley.
- Contextual bandits with side-observations. arXiv preprint arXiv:2006.03951.
- Ranked bandits in metric spaces: learning diverse rankings over large document collections. Journal of Machine Learning Research, 14(Feb): 399–436.
- Sniedovich, M. 2006. Dijkstra’s algorithm revisited: the dynamic programming connexion. Control and Cybernetics, 35(3): 599–620.
- Thompson, W. R. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4): 285–294.
- Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2): 1–305.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.