A Survey on Reinforcement Learning for Recommender Systems (2109.10665v4)
Abstract: Recommender systems have been widely applied in different real-life scenarios to help us find useful information. In particular, Reinforcement Learning (RL) based recommender systems have become an emerging research topic in recent years, owing to the interactive nature and autonomous learning ability. Empirical results show that RL-based recommendation methods often surpass most of supervised learning methods. Nevertheless, there are various challenges of applying RL in recommender systems. To understand the challenges and relevant solutions, there should be a reference for researchers and practitioners working on RL-based recommender systems. To this end, we firstly provide a thorough overview, comparisons, and summarization of RL approaches applied in four typical recommendation scenarios, including interactive recommendation, conversational recommendatin, sequential recommendation, and explainable recommendation. Furthermore, we systematically analyze the challenges and relevant solutions on the basis of existing literature. Finally, under discussion for open issues of RL and its limitations of recommender systems, we highlight some potential research directions in this field.
- S. Deng, L. Huang, G. Xu, X. Wu, and Z. Wu, “On deep learning for trust-aware recommendations in social networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 5, pp. 1164–1177, May 2017.
- J. Bobadilla, F. Ortega, A. Hernando, and A. Guti¨¦Rrez, “Recommender systems survey,” Knowl.-Based Syst., vol. 46, pp. 109–132, July 2013.
- Z. Huang, X. Xu, H. Zhu, and M. Zhou, “An efficient group recommendation model with multiattention-based neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 11, pp. 4461–4474, November 2020.
- G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, June 2005.
- Y. Shi, M. Larson, and A. Hanjalic, “Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges,” ACM Computing Surveys, vol. 47, no. 1, p. p. 3, May 2014.
- S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based recommender system: A survey and new perspectives,” ACM Computing Surveys, vol. 52, no. 1, p. p. 5, February 2019.
- W. Zhao, B. Wang, M. Yang, J. Ye, Z. Zhao, X. Chen, and Y. Shen, “Leveraging long and short-term information in content-aware movie recommendation via adversarial training,” IEEE Trans. Syst. Man Cybern., vol. 50, no. 11, pp. 4680–4693, November 2020.
- F. Pan, Q. Cai, P. Tang, F. Zhuang, and Q. He, “Policy gradients for contextual recommendations,” in Proc. WWW, 2019, pp. 1421–1431.
- L. Huang, M. Fu, F. Li, H. Qu, Y. Liu, and W. Chen, “A deep reinforcement learning based long-term recommender system,” Knowl.-Based Syst., vol. 213, p. 106706, February 2021.
- L. Ji, Q. Qin, B. Han, and H. Yang, “Reinforcement learning to optimize lifetime value in cold-start recommendation,” in Proc. CIKM, 2021, pp. 782–791.
- D. Zhao, L. Zhang, B. Zhang, L. Zheng, Y. Bao, and W. Yan, “Mahrl: Multi-goals abstraction based deep hierarchical reinforcement learning for recommendations,” in Proc. SIGIR, 2020, pp. 871–880.
- X. Wang, Y. Chen, J. Yang, L. Wu, Z. Wu, and X. Xie, “A reinforcement learning framework for explainable recommendation,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2018, pp. 587–596.
- Y. Cao, X. Chen, L. Yao, X. Wang, and W. E. Zhang, “Adversarial attacks and detection on reinforcement learning-based interactive recommender systems,” in Proc. SIGIR, 2020, pp. 1669–1672.
- E. O. Neftci and B. B. Averbeck, “Reinforcement learning in artificial and biological systems,” Nat. Mach. Intell., vol. 1, pp. 133–143, March 2019.
- H. Li, D. Liu, and D. Wang, “Manifold regularized reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 932–943, April 2018.
- K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Proc. Mag., vol. 34, no. 6, pp. 26–38, November 2017.
- G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li, “Drn: A deep reinforcement learning framework for news recommendation,” in Proc. WWW, 2018, pp. 167–176.
- D. Zha, L. Feng, B. Bhushanam, D. Choudhary, J. Nie, Y. Tian, J. Chae, Y. Ma, A. Kejariwal, and X. Hu, “Autoshard: Automated embedding table sharding for recommender systems,” in Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2022, pp. 4461–4471.
- M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, and T. Graepel, “Human-level performance in 3d multiplayer games with population-based reinforcement learning,” Science, vol. 364, no. 6443, pp. 859–865, May 2019.
- J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” Artif. Intell., vol. 32, no. 11, pp. 1238–1274, September 2013.
- L. Zou, L. Xia, Y. Gu, X. Zhao, W. Liu, J. X. Huang, and D. Yin, “Neural interactive collaborative filtering,” in Proc. SIGIR, 2020, pp. 749–758.
- Q. Liu, S. Tong, C. Liu, H. Zhao, E. Chen, H. Ma, and S. Wang, “Exploiting cognitive structure for adaptive learning,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2019, pp. 627–635.
- S. Ji, Z. Wang, T. Li, and Y. Zheng, “Spatio-temporal feature fusion for dynamic taxi route recommendation via deep reinforcement learning,” Knowl.-Based Syst., vol. 205, p. 106302, October 2020.
- H. Lee, D. Hwang, K. Min, and J. Choo, “Towards validating long-term user feedbacks in interactive recommendation systems,” in Proc. SIGIR, 2022, pp. 2607–2611.
- K. Wang, Z. Zou, Q. Deng, R. Wu, J. Tao, C. Fan, L. Chen, and P. Cui, “Reinforcement learning with a disentangled universal value function for item recommendation,” in Proc. AAAI, 2021, pp. 4427–4435.
- Y. Xian, Z. Fu, S. Muthukrishnan, G. de Melo, and Y. Zhang, “Reinforcement knowledge graph reasoning for explainable recommendation,” in Proc. SIGIR, 2019, pp. 285–294.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proc. ICLR, 2016, pp. 1–14.
- K. Zhao, X. Wang, Y. Zhang, L. Zhao, Z. Liu, C. Xing, and X. Xie, “Leveraging demonstrations for reinforcement recommendation reasoning over knowledge graphs,” in Proc. SIGIR, 2020, pp. 239–248.
- H. Wang, F. Zhang, X. Xie, and M. Guo, “Dkn: Deep knowledge-aware network for news recommendation,” in Proc. WWW, 2018, pp. 1835–1844.
- S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in Proc. UAI, 2009, pp. 452–461.
- H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, and M. Guo, “Ripplenet: Propagating user preferences on the knowledge graph for recommender systems,” in Proc. CIKM, 2018, pp. 417–426.
- L. Zhang, Z. Sun, J. Zhang, Y. Wu, and Y. Xia, “Conversation-based adaptive relational translation method for next poi recommendation with uncertain check-ins,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–14, February 2022.
- F. Zhou, R. Yin, K. Zhang, G. Trajcevski, T. Zhong, and J. Wu, “Adversarial point-of-interest recommendation,” in Proc. WWW, 2019, pp. 3462–3468.
- Z. Fu, L. Yu, and X. Niu, “Trace: Travel reinforcement recommendation based on location-aware context extraction,” ACM Trans. Knowl. Discovery Data, vol. 16, no. 4, pp. 1–22, August 2022.
- Y. Sun, F. Zhuang, H. Zhu, Q. He, and H. Xiong, “Cost-effective and interpretable job skill recommendation with deep reinforcement learning,” in Proc. WWW, 2021, pp. 3827–3838.
- Y. Wang, “A hybrid recommendation for music based on reinforcement learning,” in Proc. PAKDD, 2020, pp. 91–103.
- P. Wei, S. Xia, R. Chen, J. Qian, C. Li, and X. Jiang, “A deep-reinforcement-learning-based recommender system for occupant-driven energy optimization in commercial buildings,” IEEE Internet Things J., vol. 7, no. 7, pp. 6402–6413, July 2020.
- X. He, B. An, Y. Li, H. Chen, R. Wang, X. Wang, R. Yu, X. Li, and Z. Wang, “Learning to collaborate in multi-module recommendation via multi-agent reinforcement learning without communication,” in Proc. ACM Conf. Rec. Syst., 2020, pp. 210–219.
- G. Ke, H.-L. Du, and Y.-C. Chen, “Cross-platform dynamic goods recommendation system based on reinforcement learning and social networks,” Appl. Soft Comput., vol. 104, p. 107213, June 2021.
- J. O, J. Lee, J. W. Lee, and B.-T. Zhang, “Adaptive stock trading with dynamic asset allocation using reinforcement learning,” Inf. Sci., vol. 176, no. 15, pp. 2121–2147, August 2006.
- J. Zhang, B. Hao, B. Chen, C. Li, H. Chen, and J. Sun, “Hierarchical reinforcement learning for course recommendation in moocs,” in Proc. AAAI, 2019, pp. 435–442.
- Y. Lin, F. Lin, W. Zeng, J. Xiahou, L. Li, P. Wu, Y. Liu, and C. Miao, “Hierarchical reinforcement learning with dynamic recurrent mechanism for course recommendation,” Knowl.-Based Syst., vol. 244, p. 108546, May 2022.
- L. Wang, W. Zhang, X. He, and H. Zha, “Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2018, pp. 2447–2456.
- Z. Zheng, C. Wang, T. Xu, D. Shen, P. Qin, X. Zhao, B. Huai, X. Wu, and E. Chen, “Interaction-aware drug package recommendation via policy gradient,” ACM Trans. Inf. Sys., pp. 1–32, February 2022.
- S. M. Shortreed, E. Laber, D. J. Lizotte, T. S. Stroup, J. Pineau, and S. A. Murphy, “Informing sequential clinical decision-making through reinforcement learning: an empirical study,” Mach. learn., vol. 84, no. 1, pp. 109–136, July 2011.
- M. M. Afsar, T. Crump, and B. H. Far, “Reinforcement learning based recommender systems: A survey,” ACM Comput. Surv., pp. 1–37, June 2022.
- X. Chen, L. Yao, J. Mcauley, G. Zhou, and X. Wang, “A survey of deep reinforcement learning in recommender systems: A systematic review and future directions,” ArXiv Preprint ArXiv:2109.03540v1, 2021.
- B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, June 2018.
- C. J. Watkins and P. Dayan, “Technical note q-learning,” Mach. Learn., vol. 8, no. 3, pp. 279–292, May 1992.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” ArXiv Preprint ArXiv:1312.5602, 2013.
- R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3, pp. 229–256, May 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. NIPS, 2000, pp. 1057–1063.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv Preprint ArXiv:1707.06347, 2017.
- J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization,” in Proc. ICML, 2015, pp. 1889–1897.
- S. Levine and V. Koltun, “Guided policy search,” in Proc. ICML, 2013, pp. 1–9.
- V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” Siam Journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166, 2003.
- V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T. P. Lillicrap, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proc. ICML, 2016, pp. 1928–1937.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. ICML, 2018, pp. 1856–1865.
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proc. ICML, 2014, pp. 387–395.
- S. Adam, L. Busoniu, and R. Babuska, “Experience replay for real-time reinforcement learning control,” IEEE Trans. Syst. Man Cybern., vol. 42, no. 2, pp. 201–212, March 2012.
- X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose, “Self-supervised reinforcement learning for recommender systems,” in Proc. SIGIR, 2020, pp. 931–940.
- F. Liu, H. Guo, X. Li, R. Tang, Y. Ye, and X. He, “End-to-end deep reinforcement learning based recommendation with supervised embedding,” in Proc. WSDM, 2020, pp. 384–392.
- N. Taghipour, A. Kardan, and S. S. Ghidary, “Usage-based web recommendations: A reinforcement learning approach,” in Proc. ACM Conf. Rec. Syst., 2007, pp. 113–120.
- Y. Zhang, C. Zhang, and X. Liu, “Dynamic scholarly collaborator recommendation via competitive multi-agent reinforcement learning,” in Proc. ACM Conf. Rec. Syst., 2017, pp. 331–335.
- T. Mahmood and F. Ricci, “Learning and adaptivity in interactive recommender systems,” in Proc. ICEC, 2007, pp. 75–84.
- R. Gao, H. Xia, J. Li, D. Liu, S. Chen, and G. Chun, “Drcgr: Deep reinforcement learning framework incorporating cnn and gan-based for interactive recommendation,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2019, pp. 1048–1053.
- Y. Lei and W. Li, “Interactive recommendation with user-specific deep reinforcement learning,” ACM Trans. Knowl. Discovery Data, vol. 13, no. 6, p. 61, October 2019.
- L. Zou, L. Xia, Z. Ding, J. Song, W. Liu, and D. Yin, “Reinforcement learning to optimize long-term user engagement in recommender systems,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2019, pp. 2810–2818.
- L. Zou, L. Xia, P. Du, Z. Zhang, T. Bai, W. Liu, J.-Y. Nie, and D. Yin, “Pseudo dyna-q: A reinforcement learning framework for interactive recommendation,” in Proc. WSDM, 2020, pp. 816–824.
- S. Zhou, X. Dai, H. Chen, W. Zhang, K. Ren, R. Tang, X. He, and Y. Yu, “Interactive recommender system via knowledge graph-enhanced reinforcement learning,” in Proc. SIGIR, 2020, pp. 179–188.
- R. Zhang, T. Yu, Y. Shen, H. Jin, and C. Chen, “Text-based interactive recommendation via constraint-augmented reinforcement learning,” in Proc. NIPS, 2019, pp. 15 214–15 224.
- H. Chen, X. Dai, H. Cai, W. Zhang, X. Wang, R. Tang, Y. Zhang, and Y. Yu, “Large-scale interactive recommendation with tree-structured policy gradient,” in Proc. AAAI, 2019, pp. 3312–3320.
- W. Liu, F. Liu, R. Tang, B. Liao, G. Chen, and P. A. Heng, “Balancing between accuracy and fairness for interactive recommendation with reinforcement learning,” in Proc. Pacific-Asia Conf. Knowl. Discovery Data Mining, 2020, pp. 155–167.
- T. Xiao and D. Wang, “A general offline reinforcement learning framework for interactive recommendation,” in Proc. AAAI, 2021.
- T. Yu, Y. Shen, R. Zhang, X. Zeng, and H. Jin, “Vision-language recommendation via attribute augmented multimodal reinforcement learning,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 39–47.
- F. Liu, R. Tang, X. Li, W. Zhang, Y. Ye, H. Chen, H. Guo, Y. Zhang, and X. He, “State representation modeling for deep reinforcement learning based recommendation,” Knowl.-Based Syst., vol. 205, p. 106170, October 2020.
- M. S. Llorente and S. E. Guerrero, “Increasing retrieval quality in conversational recommenders,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 10, pp. 1876–1888, October 2012.
- T. Mahmood and F. Ricci, “Improving recommender systems with adaptive conversational strategies,” in Proc. HT, 2009, pp. 73–82.
- Y. Wu, C. Macdonald, and I. Ounis, “Partially observable reinforcement learning for dialog-based interactive recommendation,” in Proc. ACM Conf. Rec. Syst., 2021, pp. 241–251.
- D. Tsumita and T. Takagi, “Dialogue based recommender system that flexibly mixes utterances and recommendations,” in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence, 2019, pp. 51–58.
- W. Lei, G. Zhang, X. He, Y. Miao, X. Wang, L. Chen, and T.-S. Chua, “Interactive path reasoning on graph for conversational recommendation,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2020, pp. 2073–2083.
- Y. Deng, Y. Li, F. Sun, B. Ding, and W. Lam, “Unified conversational recommendation policy learning via graph-based reinforcement learning,” in Proc. SIGIR, 2021, pp. 1431–1441.
- Y. Sun and Y. Zhang, “Conversational recommender system,” in Proc. SIGIR, 2018, pp. 235–244.
- W. Lei, X. He, Y. Miao, Q. Wu, R. Hong, M.-Y. Kan, and T.-S. Chua, “Estimation-action-reflection: Towards deep interaction between conversational and recommender systems,” in Proc. WSDM, 2020, pp. 304–312.
- X. Ren, H. Yin, T. Chen, H. Wang, N. Q. V. Hung, Z. Huang, and X. Zhang, “Crsal: Conversational recommender systems with adversarial learning,” ACM Trans. Inf. Sys., vol. 38, no. 4, pp. 1–40, October 2020.
- A. Montazeralghaem and J. Allan, “Extracting relevant information from user’s utterances in conversational search and recommendation,” in Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2022, pp. 1275–1283.
- X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin, “Recommendations with negative feedback via pairwise deep reinforcement learning,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2018, pp. 1040–1048.
- D. Hong, Y. Li, and Q. Dong, “Nonintrusive-sensing and reinforcement-learning based adaptive personalized music recommendation,” in Proc. SIGIR, 2020, pp. 1721–1724.
- O. Moling, L. Baltrunas, and F. Ricci, “Optimal radio channel recommendations with explicit and implicit feedback,” in Proc. ACM Conf. Rec. Syst., 2012, pp. 75–82.
- P. Wang, Y. Fan, L. Xia, W. X. Zhao, S. Niu, and J. Huang, “Kerl: A knowledge-guided reinforcement learning model for sequential recommendation,” in Proc. SIGIR, 2020, pp. 209–218.
- Y. Lin, S. Feng, F. Lin, W. Zeng, Y. Liu, and P. Wu, “Adaptive course recommendation in moocs,” Knowl.-Based Syst., vol. 224, p. 107085, July 2021.
- X. Z. L. Xia, L. Zou, H. Liu, D. Yin, and J. Tang, “Whole-chain recommendations,” in Proc. CIKM, 2020, pp. 1883–1891.
- S. Antaris and D. Rafailidis, “Sequence adaptation via reinforcement learning in recommender systems,” in Proc. ACM Conf. Rec. Syst., 2021, pp. 714–718.
- Y. Lu, R. Dong, and B. Smyth, “Why i like it: Multi-task learning for recommendation and explanation,” in Proc. ACM Conf. Rec. Syst., 2018, pp. 4–12.
- S. Tao, R. Qiu, Y. Ping, and H. Ma, “Multi-modal knowledge-aware reinforcement learning network for explainable recommendation,” Knowl.-Based Syst., vol. 227, p. 107217, September 2021.
- S.-J. Park, D.-K. Chae, H.-K. Bae, S. Park, and S.-W. Kim, “Reinforcement learning over sentiment-augmented knowledge graphs towards accurate and explainable recommendation,” in Proc. WSDM, 2022, pp. 784–793.
- D. Liu, J. Lian, Z. Liu, X. Wang, G. Sun, and X. Xie, “Reinforced anchor knowledge graph generation for news recommendation reasoning,” in Proc. 27th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2021, pp. 1055–1065.
- H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proc. AAAI, 2016, pp. 2094–2100.
- Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. Lanctot, and N. D. Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. ICML, 2016, pp. 1995–2003.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. NIPS, 2014, pp. 2672–2680.
- C. Gao, W. Lei, X. He, M. de Rijke, and T.-S. Chua, “Advances and challenges in conversational recommender systems: A survey,” ArXiv Preprint ArXiv:2101.09459, 2021.
- C. Hu, S. Huang, Y. Zhang, and Y. Liu, “Learning to infer user implicit preference in conversational recommendation,” in Proc. SIGIR, 2022, pp. 256–266.
- S. Rendle, “Factorization machines,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2010, pp. 995–1000.
- A. Schwartz, “A reinforcement learning method for maximizing undiscounted rewards,” in Proc. ICML, 1993, pp. 298–305.
- X. Wang, K. Liu, D. Wang, L. Wu, Y. Fu, and X. Xie, “Multi-level recommendation reasoning over knowledge graphs with reinforcement learning,” in Proc. WWW, 2022, pp. 2098–2108.
- Y. Zhang and X. Chen, “Explainable recommendation: A survey and new perspectives,” Foundations and Trends in Information Retrieval, vol. 14, no. 1, pp. 1–101, March 2020.
- W. Shang, Y. Yu, Q. Li, Z. Qin, Y. Meng, and J. Ye, “Environment reconstruction with hidden confounders for reinforcement learning based recommendation,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2019, pp. 566–576.
- B. Shi, M. G. Ozsoy, N. Hurley, B. Smyth, E. Z. Tragos, J. Geraci, and A. Lawlor, “Pyrecgym: A reinforcement learning gym for recommender systems,” in Proc. ACM Conf. Rec. Syst., 2019, pp. 491–495.
- X. Wang, Y. Xu, X. He, Y. Cao, M. Wang, and T.-S. Chua, “Reinforced negative sampling over knowledge graph for recommendation,” in Proc. WWW, 2020, pp. 99–109.
- X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose, “Supervised advantage actor-critic for recommender systems,” in Proc. WSDM, 2022, pp. 1186–1196.
- J. Ding, Y. Quan, X. He, Y. Li, and D. Jin, “Reinforced negative sampling for recommendation with exposure data,” in Proc. IJCAI, 2019, pp. 2230–2236.
- J. Zhao, H. Li, L. Qu, Q. Zhang, Q. Sun, H. Huo, and M. Gong, “Dcfgan: An adversarial deep reinforcement learning framework with improved negative sampling for session-based recommender systems,” Inf. Sci., vol. 596, pp. 222–235, June 2022.
- Y. Lei, Z. Wang, W. Li, H. Pei, and Q. Dai, “Social attentive deep q-networks for recommender systems,” IEEE Trans. Knowl. Data Eng., p. 99, July 2020.
- Z. Lu, M. Gao, X. Wang, J. Zhang, H. Ali, and Q. Xiong, “Srrl: Select reliable friends for social recommendation with reinforcement learning,” in Proc. Int. Conf. Neural Inf. Process., 2019, pp. 631–642.
- P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proc. ICML, 2004, pp. 1–8.
- A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” in Proc. ICML, 2000, pp. 663–670.
- M. David and F. Ricci, “Harnessing a generalised user behaviour model for next-poi recommendation,” in Proc. ACM Conf. Rec. Syst., 2018, pp. 402–406.
- M. Babes, V. Marivate, K. Subramanian, and M. L. Littman, “Apprenticeship learning about multiple intentions,” in Proc. ICML, 2011, pp. 897–904.
- H. Liang, “Drprofiling: Deep reinforcement user profiling for recommendations in heterogenous information networks,” IEEE Trans. Knowl. Data Eng., p. 99, May 2020.
- Y. Gong, Y. Zhu, L. Duan, Q. Liu, Z. Guan, F. Sun, W. Ou, and K. Q. Zhu, “Exact-k recommendation via maximal clique optimization,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2019, pp. 617–626.
- F. Torabi, G. Warnell, and P. Stone, “Behavioral cloning from observation,” in Proc. IJCAI, 2018, pp. 4950–4957.
- C. Gao, K. Xu, K. Zhou, L. Li, X. Wang, B. Yuan, and P. Zhao, “Value penalized q-learning for recommender systems,” in Proc. SIGIR, 2022, pp. 2008–2012.
- M. Preda and D. Popescu, “Personalized web recommendations: supporting epistemic information about end-users,” in Proc. WI, 2005, pp. 692–695.
- X. Chen, S. Li, H. Li, S. Jiang, Y. Qi, and L. Song, “Generative adversarial user model for reinforcement learning based recommendation system,” in Proc. ICML, 2019, pp. 1052–1061.
- X. Chen, L. Yao, A. Sun, X. Wang, X. Xu, and L. Zhu, “Generative inverse deep reinforcement learning for online recommendation,” in Proc. CIKM, 2021, pp. 201–210.
- C. Pei, X. Yang, Q. Cui, X. Lin, P. J. Fei Sun, W. Ou, and Y. Zhang, “Value-aware recommendation based on reinforcement profit maximization,” in Proc. WWW, 2019, pp. 3123–3129.
- X. Zhao, C. Gu, H. Zhang, X. Yang, X. Liu, H. Liu, and J. Tang, “Dear: Deep reinforcement learning for online advertising impression in recommender systems,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
- L. Zou, L. Xia, Z. Ding, D. Yin, J. Song, and W. Liu, “Reinforcement learning to diversify top-n recommendation,” in International Conference on Database Systems for Advanced Applications, 2019, pp. 104–120.
- D. Precup, R. S. Sutton, and S. Dasgupta, “Off-policy temporal difference learning with function approximation,” in Proc. ICML, 2001, pp. 417–424.
- R. Munos, T. Stepleton, A. Harutyunyan, and M. G. Bellemare, “Safe and efficient off-policy reinforcement learning,” in Proc. NIPS, 2016, pp. 1054–1062.
- M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti, and E. H. Chi, “Top-k off-policy correction for a reinforce recommender system,” in Proc. WSDM, 2019, pp. 456–464.
- J. Ma, Z. Zhao, X. Yi, J. Yang, M. Chen, J. Tang, L. Hong, and E. H. Chi, “Off-policy learning in two-stage recommender systems,” in Proc. WWW, 2020, pp. 463–473.
- D. G. Horvitz and D. J. Thompson, “A generalization of sampling without replacement from a finite universe,” J. Am. Stat. Assoc., vol. 47, no. 260, pp. 663–685, April 1952.
- X. Bai, J. Guan, and H. Wang, “A model-based reinforcement learning with adversarial training for online recommendation,” in Proc. NIPS, 2019, pp. 10 735–10 746.
- R. Xie, S. Zhang, R. Wang, F. Xia, and L. Lin, “A peep into the future: Adversarial future encoding in recommendation,” in Proc. WSDM, 2022, pp. 1177–1185.
- L. Busoniu, R. Babuska, and B. D. Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Syst. Man Cybern., vol. 38, no. 2, pp. 156–172, March 2008.
- Z. Du, N. Yang, Z. Yu, and P. Yu, “Learning from atypical behavior: Temporary interest aware recommendation based on reinforcement learning,” IEEE Trans. Knowl. Data Eng., pp. 1–13, January 2022.
- Y. Z. Wei, L. Moreau, and N. R. Jennings, “Learning users’ interests by quality classification in market-based recommender systems,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1678–1688, December 2005.
- W. Zhang, H. Liu, H. Xiong, T. Xu, F. Wang, H. Xin, and H. Wu, “Rlcharge: Imitative multi-agent spatiotemporal reinforcement learning for electric vehicle charging station recommendation,” IEEE Trans. Knowl. Data Eng., pp. 1–14, May 2022.
- W. Zhang, H. Liu, F. Wang, T. Xu, H. Xin, D. Dou, and H. Xiong, “Intelligent electric vehicle charging recommendation based on multi-agent reinforcement learning,” in Proc. WWW, 2021, pp. 1856–1867.
- R. Parr and S. J. Russell, “Reinforcement learning with hierarchies of machines,” in Proc. NIPS, 1997, pp. 1043–1049.
- T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” J. Artif. Intell. Res., vol. 13, no. 1, pp. 227–303, November 2000.
- R. Xie, S. Zhang, R. Wang, F. Xia, and L. Lin, “Hierarchical reinforcement learning for integrated recommendation,” in Proc. AAAI, 2021.
- R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artif. Intell., vol. 112, no. 1, pp. 181–211, August 1999.
- L. Wang, R. Tang, X. He, and X. He, “Hierarchical imitation learning via subgoal representation learning for dynamic treatment recommendation,” in Proc. WSDM, 2022, pp. 1081–1089.
- G. Theocharous, P. S. Thomas, and M. Ghavamzadeh, “Personalized ad recommendation systems for life-time value optimization with guarantees,” in Proc. IJCAI, 2015, pp. 1806–1812.
- G. Theocharous, P. Thomas, and M. Ghavamzadeh, “Ad recommendation systems for life-time value optimization,” in Proc. WWW, 2015, pp. 1305–1310.
- F. Liu, R. Tang, H. Guo, X. Li, Y. Ye, and X. He, “Top-aware reinforcement learning based recommendation,” Neurocomputing, vol. 417, pp. 255–269, December 2020.
- H. Liu, Z. Sun, X. Qu, and F. Yuan, “Top-aware recommender distillation with deep reinforcement learning,” Inf. Sci., vol. 576, pp. 642–657, October 2021.
- M. Chen, B. Chang, C. Xu, and E. H. Chi, “User response models to improve a reinforce recommender system,” in Proc. WSDM, 2021, pp. 121–129.
- Z. Xu and U. Topcu, “Transfer of temporal logic formulas in reinforcement learning,” in Proc. IJCAI, 2019, pp. 4010–4018.
- A. Tirinzoni, A. Sessa, M. Pirotta, and M. Restelli, “Importance weighted transfer of samples in reinforcement learning,” in Proc. ICML, 2018, pp. 4936–4945.
- P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proc. AAAI, 2017, pp. 3207–3214.
- J. Welborn, M. Schaarschmidt, and E. Yoneki, “Learning index selection with structured action spaces,” ArXiv Preprint ArXiv:1909.07440, 2019.
- Y. Xu, L. Qin, X. Liu, J. Xie, and S.-C. Zhu, “A causal and-or graph model for visibility fluent reasoning in tracking interacting objects,” in Proc. CVPR, 2018, pp. 2178–2187.
- R. Powers and Y. Shoham, “New criteria and a new algorithm for learning in multi-agent systems,” in Proc. NIPS, 2004, pp. 1089–1096.
- R. Fakoor, P. Chaudhari, S. Soatto, and A. J. Smola, “Meta-q-learning,” in Eighth International Conference on Learning Representations, 2020.
- Q. Zhang, J. Liu, Y. Dai, Y. Qi, Y. Yuan, K. Zheng, F. Huang, and X. Tan, “Multi-task fusion via reinforcement learning for long-term user satisfaction in recommender systems,” in Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2022, pp. 4510–4520.
- R. Xie, S. Zhang, R. Wang, F. Xia, and L. Lin, “Explore, filter and distill: Distilled reinforcement learning in recommendation,” in Proc. CIKM, 2021, pp. 4243–4252.
- M. Fu, A. Agrawal, A. A. Irissappane, J. Zhang, L. Huang, and H. Qu, “Deep reinforcement learning framework for category-based item recommendation,” IEEE Trans. Cybern., pp. 1–14, August 2021.
- M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, “Noisy networks for exploration,” in Proc. ICLR, 2018.
- M. Kunaver and T. Porl, “Diversity in recommender systems a survey,” Knowl.-Based Syst., vol. 123, pp. 154–162, May 2017.
- Y. Ge, X. Zhao, L. Yu, S. Paul, D. Hu, C.-C. Hsieh, and Y. Zhang, “Toward pareto efficient fairness-utility trade-off in recommendation through reinforcement learning,” in Proc. WSDM, 2022, pp. 316–324.
- C. Lücken, B. Barán, and C. Brizuela, “A survey on multi-objective evolutionary algorithms for many-objective problems,” Comput. Optim. Appl., vol. 58, no. 3, pp. 707–756, February 2014.
- X. Chen, Y. Du, L. Xia, and J. Wang, “Reinforcement recommendation with user multi-aspect preference,” in Proc. WWW, 2021, pp. 425–435.
- D. Stamenkovic, A. Karatzoglou, I. Arapakis, X. Xin, and K. Katevas, “Choosing the best of both worlds: Diverse and novel recommendations through multi-objective reinforcement learning,” in Proc. WSDM, 2022, pp. 957–965.
- R. Xie, Y. Liu, S. Zhang, R. Wang, F. Xia, and L. Lin, “Personalized approximate pareto-efficient recommendation,” in Proc. WWW, 2021, pp. 3839–3849.
- Y. Ge, S. Liu, R. Gao, Y. Xian, Y. Li, X. Zhao, C. Pei, F. Sun, J. Ge, W. Ou, and Y. Zhang, “Towards long-term fairness in recommendation,” in Proc. WSDM, 2021, pp. 445–453.
- D. Li, X. Li, J. Wang, and P. Li, “Video recommendation with multi-gate mixture of experts soft actor critic,” in Proc. SIGIR, 2020, pp. 1553–1556.
- E. Puiutta and E. M. S. P. Veith, “Explainable reinforcement learning: A survey,” in International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 2020, pp. 77–95.
- P. Wu, H. Li, Y. Deng, W. Hu, Q. Dai, Z. Dong, J. Sun, R. Zhang, and X.-H. Zhou, “On the opportunity of causal learning in recommendation systems: Foundation, estimation, prediction and challenges,” in Proc. IJCAI, 2022, pp. 1–8.
- C.-Y. Tai, L.-Y. Huang, C.-K. Huang, and L.-W. Ku, “User-centric path reasoning towards explainable recommendation,” in Proc. SIGIR, 2021, pp. 879–889.
- Y. Xiao, L. Xiao, X. Lu, H. Zhang, S. Yu, and H. V. Poor, “Deep-reinforcement-learning-based user profile perturbation for privacy-aware recommendation,” IEEE Internet of Things Journal, vol. 8, no. 6, pp. 4560–4568, March 2021.
- W. Fan, T. Derr, X. Zhao, Y. Ma, H. Liu, J. Wang, J. Tang, and Q. Li, “Attacking black-box recommendations via copying cross-domain user profiles,” in Proc. IEEE 37th Int. Conf. Data Engineering, 2021, pp. 1583–1594.
- J. Garcia and F. Fernandez, “A comprehensive survey on safe reinforcement learning,” J. Mach. Learn. Res., vol. 16, no. 1, pp. 1437–1480, August 2015.
- T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Proc. Mag., vol. 37, no. 3, pp. 50–60, May 2020.
- W. Huang, J. Liu, T. Li, T. Huang, S. Ji, and J. Wan, “Feddsr: Daily schedule recommendation in a federated deep reinforcement learning framework,” IEEE Trans. Knowl. Data Eng., pp. 1–1, November 2021.
- Yuanguo Lin (5 papers)
- Yong Liu (721 papers)
- Fan Lin (15 papers)
- Lixin Zou (22 papers)
- Pengcheng Wu (25 papers)
- Wenhua Zeng (2 papers)
- Huanhuan Chen (42 papers)
- Chunyan Miao (145 papers)