Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems (2401.09034v2)

Published 17 Jan 2024 in cs.IR and cs.AI

Abstract: Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems by effectively exploring users' interests. However, modern recommender systems exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration. For example, user behaviors with different activity levels require varying intensity of exploration, while previous studies often overlook this aspect and apply a uniform exploration strategy to all users, which ultimately hurts user experiences in the long run. To address these challenges, we propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups. We first construct a distributional critic which allows policy optimization under varying quantile levels of cumulative reward feedbacks from users, representing user groups with varying activity levels. Guided by this critic, we devise a population of distinct actors aimed at effective and fine-grained exploration within its respective user group. To simultaneously enhance diversity and stability during the exploration process, we further introduce a population-level diversity regularization term and a supervision module. Experimental results on public recommendation datasets demonstrate that our approach outperforms all other baselines in terms of long-term performance, validating its user-oriented exploration effectiveness. Meanwhile, further analyses reveal our approach's benefits of improved performance for low-activity users as well as increased fairness among users.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pp.  39–1. JMLR Workshop and Conference Proceedings, 2012.
  2. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256, 2002.
  3. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  4. A distributional perspective on reinforcement learning. In International conference on machine learning, pp.  449–458. PMLR, 2017.
  5. A semi-personalized system for user cold start recommendation on music streaming apps. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp.  2601–2609, 2021.
  6. Active learning in recommendation systems with multi-level user preferences. arXiv preprint arXiv:1811.12591, 2018.
  7. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355, 2018.
  8. Diego Carraro. Active learning in recommender systems: an unbiased and beyond-accuracy perspective. PhD thesis, University College Cork, 2020.
  9. Top-k off-policy correction for a reinforce recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp.  456–464, 2019.
  10. Values of user exploration in recommender systems. In Proceedings of the 15th ACM Conference on Recommender Systems, pp.  85–95, 2021.
  11. Controllable multi-objective re-ranking with policy hypernetworks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  3855–3864, 2023.
  12. Qd-rl: Efficient mixing of quality and diversity in reinforcement learning.
  13. Implicit quantile networks for distributional reinforcement learning. In International conference on machine learning, pp.  1096–1105. PMLR, 2018a.
  14. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018b.
  15. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE transactions on neural networks and learning systems, 33(11):6584–6598, 2021.
  16. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679, 2015.
  17. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2018.
  18. Fairness-aware explainable recommendation over knowledge graphs. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  69–78, 2020.
  19. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp.  1587–1596. PMLR, 2018.
  20. Towards long-term fairness in recommendation. In Proceedings of the 14th ACM international conference on web search and data mining, pp.  445–453, 2021.
  21. Efficient risk-averse reinforcement learning. Advances in Neural Information Processing Systems, 35:32639–32652, 2022.
  22. Dynamical distance learning for semi-supervised and unsupervised skill discovery. In International Conference on Learning Representations, 2019.
  23. Vime: Variational information maximizing exploration. Advances in neural information processing systems, 29, 2016.
  24. Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp.  368–377, 2018.
  25. Aliexpress learning-to-rank: Maximizing online model performance without going online. IEEE Transactions on Knowledge and Data Engineering, 2021.
  26. Slateq: A tractable decomposition for reinforcement learning with recommendation sets. 2019.
  27. Novelty search for deep reinforcement learning policy network weights by action sequence edit metric distance. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. Association for Computing Machinery, 2019.
  28. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), pp.  197–206. IEEE, 2018.
  29. Active learning for aspect model in recommender systems. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp.  162–167. IEEE, 2011.
  30. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  31. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In International Conference on Machine Learning, pp.  5556–5566. PMLR, 2020.
  32. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011.
  33. User fairness in recommender systems. In Companion Proceedings of the The Web Conference 2018, pp.  101–102, 2018.
  34. Mixing update q-value for deep reinforcement learning. In 2019 International Joint Conference on Neural Networks (IJCNN), pp.  1–6. IEEE, 2019.
  35. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  36. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027, 2018.
  37. Exploration and regularization of the latent action space in recommendation. In Proceedings of the ACM Web Conference 2023, pp.  833–844, 2023.
  38. Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
  39. Fairmatch: A graph-based approach for improving aggregate diversity in recommender systems. In Proceedings of the 28th ACM conference on user modeling, adaptation and personalization, pp.  154–162, 2020.
  40. A Merialdo. Improving collaborative filtering for new-users by smart object selection. In Proceedings of International Conference on Media Features (ICMF), May, 2001.
  41. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp.  1928–1937. PMLR, 2016.
  42. Variational information maximisation for intrinsically motivated reinforcement learning. Advances in neural information processing systems, 28, 2015.
  43. Effective diversity in population based reinforcement learning. Advances in Neural Information Processing Systems, 33:18050–18062, 2020.
  44. Personalized re-ranking for recommendation. In Proceedings of the 13th ACM conference on recommender systems, pp.  3–11, 2019.
  45. Parameter space noise for exploration. arXiv preprint arXiv:1706.01905, 2017.
  46. Multi-objective optimization of notifications using offline reinforcement learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  3752–3760, 2022.
  47. Getting to know you: learning new user preferences in recommender systems. In Proceedings of the 7th international conference on Intelligent user interfaces, pp.  127–134, 2002.
  48. An mdp-based recommender system. Journal of Machine Learning Research, 6(9), 2005.
  49. Model-based active exploration. In International conference on machine learning, pp.  5779–5788. PMLR, 2019.
  50. Reinforcement learning: An introduction. Robotica, 17(2):229–235, 1999.
  51. Worst cases policy gradients. arXiv preprint arXiv:1911.03618, 2019.
  52. Safe collaborative filtering. arXiv preprint arXiv:2306.05292, 2023.
  53. Risk-averse offline reinforcement learning. arXiv preprint arXiv:2102.05371, 2021.
  54. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  55. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  56. A survey on the fairness of recommender systems. ACM Transactions on Information Systems, 41(3):1–43, 2023.
  57. Empowering news recommendation with pre-trained language models. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  1652–1656, 2021.
  58. Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  3597–3606, 2020a.
  59. Diffnet++: A neural influence and interest diffusion network for social recommendation. IEEE Transactions on Knowledge and Data Engineering, 34(10):4753–4766, 2020b.
  60. Dual graph attention networks for deep latent representation of multifaceted social effects in recommender systems. In The world wide web conference, pp.  2091–2102, 2019.
  61. Deep learning for matching in search and recommendation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp.  1365–1368, 2018.
  62. Teach and explore: A multiplex information-guided effective and efficient reinforcement learning for sequential recommendation. ACM Transactions on Information Systems, 2023.
  63. Yang Yu. Towards sample efficient reinforcement learning. In IJCAI, pp.  5739–5743, 2018.
  64. Cold & warm net: Addressing cold-start users in recommender systems. In International Conference on Database Systems for Advanced Applications, pp.  532–543. Springer, 2023.
  65. Counteracting user attention bias in music streaming recommendation via reward modification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  2504–2514, 2022.
  66. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM conference on recommender systems, pp.  95–103, 2018.
  67. Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pp.  167–176, 2018.
  68. Deep exploration for recommendation systems. In Proceedings of the 17th ACM Conference on Recommender Systems, pp.  963–970, 2023.
  69. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web, pp.  22–32, 2005.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Changshuo Zhang (8 papers)
  2. Sirui Chen (34 papers)
  3. Xiao Zhang (435 papers)
  4. Sunhao Dai (22 papers)
  5. Weijie Yu (18 papers)
  6. Jun Xu (398 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets