Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 45 tok/s
GPT-5 High 43 tok/s Pro
GPT-4o 103 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 215 tok/s Pro
2000 character limit reached

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks (2211.11869v4)

Published 21 Nov 2022 in cs.LG, cs.AI, cs.NA, math.NA, and math.OC

Abstract: This effort is focused on examining the behavior of reinforcement learning systems in personalization environments and detailing the differences in policy entropy associated with the type of learning algorithm utilized. We demonstrate that Policy Optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others. Conversely, we also show that Q-Learning agents are far less susceptible to such behavior and generally maintain high-entropy policies throughout training, which is often preferable in real-world applications. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Reinforcement learning as a framework for ethical decision making. In Workshops at the thirtieth AAAI conference on artificial intelligence, 2016.
  2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
  3. Understanding the impact of entropy on policy optimization. In International conference on machine learning, pages 151–160. PMLR, 2019.
  4. X. Amatriain and J. Basilico. Recommender Systems in Industry: A Netflix Case Study, pages 385–419. Springer US, Boston, MA, 2015.
  5. M. Aspinall and R. Hamermesh. Realizing the promise of personalized medicine. Harvard business review, 85:108–17, 165, 11 2007.
  6. Recommender systems: An overview. AI Magazine, 32(3):13–18, Jun. 2011.
  7. Surrogate objectives for batch policy optimization in one-step decision making. Advances in Neural Information Processing Systems, 32, 2019.
  8. Reinforcement learning for personalization: A systematic literature review. Data Science, 3:1–41, 04 2020.
  9. Offline policy comparison under limited historical agent-environment interactions. arXiv preprint arXiv:2106.03934, 2021.
  10. On the unreasonable efficiency of state space clustering in personalization tasks. In 2021 International Conference on Data Mining Workshops (ICDMW), pages 742–749. IEEE, 2021.
  11. Evaluating the effectiveness of personalized web search. IEEE Transactions on Knowledge and Data Engineering, 21(8):1178–1190, 2008.
  12. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 1097–1104, 2011.
  13. Automatic web content personalization through reinforcement learning. Journal of Systems and Software, 121:157–169, 2016.
  14. An alternate policy gradient estimator for softmax policies. arXiv preprint arXiv:2112.11622, 2021.
  15. Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol, 19(12):491–496, 2001.
  16. C. A. Gomez-Uribe and N. Hunt. The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6(4), 2016.
  17. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307, 2012.
  18. Zero-shot recommendations with pre-trained large language models for multimodal nudging. In 2023 IEEE International Conference on Data Mining Workshops (ICDMW), pages 1535–1542. IEEE, 2023.
  19. Personalization of health interventions using cluster-based reinforcement learning. In International Conference on Principles and Practice of Multi-Agent Systems, pages 467–475. Springer, 2018.
  20. Recsim: A configurable simulation platform for recommender systems. arXiv preprint arXiv:1909.04847, 2019.
  21. Music personalization at spotify. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, page 373, New York, NY, USA, 2016. Association for Computing Machinery.
  22. Q-learning algorithms: A comprehensive classification and applications. IEEE access, 7:133653–133667, 2019.
  23. J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. Advances in neural information processing systems, 20, 2007.
  24. L. Lasalvia. Personalization and standardization: Can we have it all? Journal of Precision Medicine| Volume, 6(1), 2020.
  25. An actor-critic contextual bandit algorithm for personalized mobile health interventions. arXiv preprint arXiv:1706.09090, 2017.
  26. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010.
  27. A sample-efficient actor-critic algorithm for recommendation diversification. Chinese Journal of Electronics, 29(1):89–96, 2020.
  28. Escaping the gravitational pull of softmax. Advances in Neural Information Processing Systems, 33:21130–21140, 2020.
  29. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
  30. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  31. Bridging the gap between value and policy based reinforcement learning. Advances in neural information processing systems, 30, 2017.
  32. Policy gradients for contextual recommendations. In The World Wide Web Conference, pages 1421–1431, 2019.
  33. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 2021.
  34. Introduction to recommender systems handbook. In Recommender Systems Handbook, 2011.
  35. Recommender Systems: Introduction and Challenges, pages 1–34. Springer US, Boston, MA, 2015.
  36. Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720, 2018.
  37. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  38. B. Smith and G. Linden. Two decades of recommender systems at amazon.com. IEEE Internet Computing, 21(3):12–18, 2017.
  39. A. Srivihok and P. Sukonmanee. E-commerce intelligent agent: personalization travel support agent using q learning. In Proceedings of the 7th international conference on Electronic commerce, pages 287–292, 2005.
  40. Reinforcement learning: An introduction. MIT press, 2018.
  41. The alberta plan for ai research. arXiv preprint arXiv:2208.11173, 2022.
  42. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  43. A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In International Conference on Machine Learning, pages 814–823. PMLR, 2015.
  44. Adaptive learning recommendation strategy based on deep q-learning. Applied psychological measurement, 44(4):251–266, 2020.
  45. Personalized recommendation via parameter-free contextual bandits. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 323–332, 2015.
  46. P. S. Thomas and B. Okal. A notation for markov decision processes. arXiv preprint arXiv:1512.09075, 2015.
  47. Recommender systems in the healthcare domain: state-of-the-art and research issues. Journal of Intelligent Information Systems, 57(1):171–201, 2021.
  48. Design patterns for personalization of healthcare process. In Proceedings of the 2019 2nd International Conference on Geoinformatics and Data Analysis, ICGDA 2019, page 83–88, New York, NY, USA, 2019. Association for Computing Machinery.
  49. Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150, 2019.
  50. Interactive narrative personalization with deep reinforcement learning. In IJCAI, pages 3852–3858, 2017.
  51. Exploration in interactive personalized music recommendation: A reinforcement learning approach. ACM Trans. Multimedia Comput. Commun. Appl., 11(1), Sept. 2014.
  52. The societal implications of deep reinforcement learning. Journal of Artificial Intelligence Research, 70:1003–1030, 2021.
  53. Supervised advantage actor-critic for recommender systems. ACM International WSDM Conference, 15, 2022.
  54. A regularized approach to sparse optimal policy in reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  55. A self-tuning actor-critic algorithm. Advances in Neural Information Processing Systems, 33:20913–20924, 2020.
  56. Robust actor-critic contextual bandit for mobile health (mhealth) interventions. In Proceedings of the 2018 acm international conference on bioinformatics, computational biology, and health informatics, pages 492–501, 2018.
Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube