A General Offline Reinforcement Learning Framework for Interactive Recommendation (2310.00678v1)
Abstract: This paper studies the problem of learning interactive recommender systems from logged feedbacks without any exploration in online environments. We address the problem by proposing a general offline reinforcement learning framework for recommendation, which enables maximizing cumulative user rewards without online exploration. Specifically, we first introduce a probabilistic generative model for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world datasets, demonstrating that the proposed methods can achieve superior performance over existing supervised learning and reinforcement learning methods for recommendation.
- A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation. In Advances in Neural Information Processing Systems, 10734–10745.
- Variational inference: A review for statisticians. Journal of the American statistical Association 112(518): 859–877.
- Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 456–464.
- Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In Proceedings of the 36th International Conference on Machine Learning, 1052–1061.
- Deep Tensor Factorization for Multi-Criteria Recommender Systems. In 2019 IEEE International Conference on Big Data, 1046–1051.
- Improving Cold-Start Recommendation via Multi-Prior Meta-Learning. In 43rd European Conference on IR Research.
- Deep Transfer Tensor Decomposition with Orthogonal Constraint for Recommender Systems. In The Thirty-Fifth AAAI Conference on Artificial Intelligence.
- Christodoulou, P. 2019. Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207 .
- Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems, 191–198.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, 2052–2062.
- Exact-K Recommendation via Maximal Clique Optimization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617–626.
- Latent space policies for hierarchical reinforcement learning. arXiv preprint arXiv:1804.02808 .
- Reinforcement learning with deep energy-based policies. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 1352–1361.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning, 1861–1870.
- Deep reinforcement learning with double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2094–2100.
- Expectation propagation for approximate inference in dynamic bayesian networks. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, 216–223.
- Session-based recommendations with recurrent neural networks. In 4th International Conference on Learning Representations.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 .
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 .
- Deep Learning with Logged Bandit Feedback. In 6th International Conference on Learning Representations.
- Kakade, S. 2003. On the Sample Complexity of Reinforcement Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London .
- Approximately Optimal Approximate Reinforcement Learning. In Proceedings of the Nineteenth International Conference on Machine Learning, 267–274.
- Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM), 197–206. IEEE.
- Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems, 11784–11794.
- Levine, S. 2018. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909 .
- Off-Policy Policy Gradient with Stationary Distribution Correction. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 440.
- Semi-supervisedly co-embedding attributed networks. In Advances in Neural Information Processing Systems, 6507–6516.
- Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web, 811–820.
- Off-policy Bandits with Deficient Support. In The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 965–975.
- Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining, 501–509.
- Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning. In International Conference on Learning Representations.
- ISLF: interest shift and latent factors combination model for session-based recommendation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 5765–5771.
- Reinforcement learning: An introduction. MIT press.
- Batch learning from logged bandit feedback through counterfactual risk minimization. J. Mach. Learn. Res. 16: 1731–1755.
- The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems, 3231–3239.
- Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 565–573.
- Matching networks for one shot learning. In Advances in neural information processing systems, 3630–3638.
- Graphical models, exponential families, and variational inference. Now Publishers Inc.
- Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2447–2456.
- Dynamic Collaborative Recurrent Learning. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1151–1160.
- Hierarchical neural variational model for personalized sequential recommendation. In The World Wide Web Conference, 3377–3383.
- Bayesian deep collaborative matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 5474–5481.
- Dynamic Bayesian Metric Learning for Personalized Product Search. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1693–1702.
- Neural variational matrix factorization with side information for collaborative filtering. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 414–425. Springer.
- Self-Supervised Reinforcement Learning for Recommender Systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 931–940.
- A simple convolutional generative network for next item recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, 582–590.
- Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems, 95–103.
- Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1040–1048.
- Jointly Learning to Recommend and Advertise. In The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3319–3327.
- DRN: A Deep Reinforcement Learning Framework for News Recommendation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, 167–176.
- Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 179–188.
- Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2810–2818.
- Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation. In The Thirteenth ACM International Conference on Web Search and Data Mining, 816–824.