Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 187 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 104 tok/s Pro
Kimi K2 177 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks (2310.05324v1)

Published 9 Oct 2023 in cs.LG

Abstract: In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. R. Burke, A. Felfernig, and M. H. Göker, “Recommender systems: An overview,” AI Magazine, vol. 32, no. 3, pp. 13–18, Jun. 2011. [Online]. Available: https://ojs.aaai.org/index.php/aimagazine/article/view/2361
  2. C. A. Gomez-Uribe and N. Hunt, “The netflix recommender system: Algorithms, business value, and innovation,” ACM Transactions on Management Information Systems (TMIS), vol. 6, no. 4, 2016. [Online]. Available: https://doi.org/10.1145/2843948
  3. K. Jacobson, V. Murali, E. Newett, B. Whitman, and R. Yon, “Music personalization at spotify,” in Proceedings of the 10th ACM Conference on Recommender Systems, ser. RecSys ’16.   New York, NY, USA: Association for Computing Machinery, 2016, p. 373. [Online]. Available: https://doi.org/10.1145/2959100.2959120
  4. B. Smith and G. Linden, “Two decades of recommender systems at amazon.com,” IEEE Internet Computing, vol. 21, no. 3, pp. 12–18, 2017.
  5. X. Wang, Y. Wang, D. Hsu, and Y. Wang, “Exploration in interactive personalized music recommendation: A reinforcement learning approach,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 11, no. 1, Sep. 2014. [Online]. Available: https://doi.org/10.1145/2623372
  6. T. N. T. Tran, A. Felfernig, C. Trattner, and A. Holzinger, “Recommender systems in the healthcare domain: state-of-the-art and research issues,” Journal of Intelligent Information Systems, vol. 57, no. 1, pp. 171–201, 2021.
  7. S. Ferretti, S. Mirri, C. Prandi, and P. Salomoni, “Automatic web content personalization through reinforcement learning,” Journal of Systems and Software, vol. 121, pp. 157–169, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0164121216000443
  8. F. Ricci, L. Rokach, and B. Shapira, “Introduction to recommender systems handbook,” in Recommender Systems Handbook, 2011.
  9. L. Lasalvia, “Personalization and standardization: Can we have it all?” Journal of Precision Medicine— Volume, vol. 6, no. 1, 2020.
  10. A. Vatian, S. Dudorov, A. Ivchenko, K. Smirnov, E. Chikshova, A. Lobantsev, V. Parfenov, A. Shalyto, and N. Gusarova, “Design patterns for personalization of healthcare process,” in Proceedings of the 2019 2nd International Conference on Geoinformatics and Data Analysis, ser. ICGDA 2019.   New York, NY, USA: Association for Computing Machinery, 2019, p. 83–88. [Online]. Available: https://doi.org/10.1145/3318236.3318249
  11. H. Lei, A. Tewari, and S. A. Murphy, “An actor-critic contextual bandit algorithm for personalized mobile health interventions,” arXiv preprint arXiv:1706.09090, 2017.
  12. F. Zhu, J. Guo, R. Li, and J. Huang, “Robust actor-critic contextual bandit for mobile health (mhealth) interventions,” in Proceedings of the 2018 acm international conference on bioinformatics, computational biology, and health informatics, 2018, pp. 492–501.
  13. A. e. Hassouni, M. Hoogendoorn, M. v. Otterlo, and E. Barbaro, “Personalization of health interventions using cluster-based reinforcement learning,” in International Conference on Principles and Practice of Multi-Agent Systems.   Springer, 2018, pp. 467–475.
  14. C. Tan, R. Han, R. Ye, and K. Chen, “Adaptive learning recommendation strategy based on deep q-learning,” Applied psychological measurement, vol. 44, no. 4, pp. 251–266, 2020.
  15. M. Aspinall and R. Hamermesh, “Realizing the promise of personalized medicine,” Harvard business review, vol. 85, pp. 108–17, 165, 11 2007.
  16. G. S. Ginsburg and J. J. McCarthy, “Personalized medicine: revolutionizing drug discovery and patient care,” Trends in Biotechnology, vol. 19, no. 12, pp. 491–496, 2001. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167779901018145
  17. F. den Hengst, E. Grua, A. el Hassouni, and M. Hoogendoorn, “Reinforcement learning for personalization: A systematic literature review,” Data Science, vol. 3, pp. 1–41, 04 2020.
  18. A. Dereventsov, C. G. Webster, and J. Daws, “An adaptive stochastic gradient-free approach for high-dimensional blackbox optimization,” Montreal, Canada, 2022.
  19. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” CoRR, vol. abs/1812.05905, 2018. [Online]. Available: http://arxiv.org/abs/1812.05905
  20. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80.   PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html
  21. T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70.   PMLR, 06–11 Aug 2017, pp. 1352–1361. [Online]. Available: https://proceedings.mlr.press/v70/haarnoja17a.html
  22. I. Csiszar, “I𝐼Iitalic_I-Divergence Geometry of Probability Distributions and Minimization Problems,” The Annals of Probability, vol. 3, no. 1, pp. 146 – 158, 1975. [Online]. Available: https://doi.org/10.1214/aop/1176996454
  23. A. Genevay, “Entropy-Regularized Optimal Transport for Machine Learning,” Theses, PSL University, Mar. 2019. [Online]. Available: https://theses.hal.science/tel-02319318
  24. A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, “A kernel method for the two-sample-problem,” in Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hoffman, Eds., vol. 19.   MIT Press, 2006. [Online]. Available: https://proceedings.neurips.cc/paper/2006/file/e9fb2eda3d9c55a0d89c98d6c54b5b3e-Paper.pdf
  25. A. Müller, “Integral probability metrics and their generating classes of functions,” Advances in Applied Probability, vol. 29, no. 2, p. 429–443, 1997.
  26. C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. Póczos, “Mmd gan: Towards deeper understanding of moment matching network,” in NIPS, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:4685015
  27. M. Binkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying mmd gans,” ArXiv, vol. abs/1801.01401, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:3531856
  28. R. J. Williams and J. Peng, “Function optimization using connectionist reinforcement learning algorithms,” Connection Science, vol. 3, no. 3, pp. 241–268, 1991.
  29. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning.   PMLR, 2016, pp. 1928–1937.
  30. Z. Ahmed, N. Le Roux, M. Norouzi, and D. Schuurmans, “Understanding the impact of entropy on policy optimization,” in International conference on machine learning.   PMLR, 2019, pp. 151–160.
  31. A. Galashov, S. M. Jayakumar, L. Hasenclever, D. Tirumala, J. Schwarz, G. Desjardins, W. M. Czarnecki, Y. W. Teh, R. Pascanu, and N. Heess, “Information asymmetry in kl-regularized rl,” arXiv preprint arXiv:1905.01240, 2019.
  32. Q. Wang, Y. Li, J. Xiong, and T. Zhang, “Divergence-augmented policy optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  33. Z.-W. Hong, T.-Y. Shann, S.-Y. Su, Y.-H. Chang, T.-J. Fu, and C.-Y. Lee, “Diversity-driven exploration strategy for deep reinforcement learning,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31.   Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper/2018/file/a2802cade04644083dcde1c8c483ed9a-Paper.pdf
  34. M. A. Masood and F. Doshi-Velez, “Diversity-inducing policy gradient: Using maximum mean discrepancy to find a set of diverse policies,” in IJCAI, 2019, pp. 5923–5929. [Online]. Available: https://doi.org/10.24963/ijcai.2019/821
  35. J. Langford and T. Zhang, “The epoch-greedy algorithm for multi-armed bandits with side information,” Advances in neural information processing systems, vol. 20, 2007.
  36. L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” in Proceedings of the 19th international conference on World wide web, 2010, pp. 661–670.
  37. L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li, “Personalized recommendation via parameter-free contextual bandits,” in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, 2015, pp. 323–332.
  38. Z. Dou, R. Song, J.-R. Wen, and X. Yuan, “Evaluating the effectiveness of personalized web search,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 8, pp. 1178–1190, 2008.
  39. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning.   PMLR, 2015, pp. 1889–1897.
  40. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  41. M. Dudík, J. Langford, and L. Li, “Doubly robust policy evaluation and learning,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, 2011, pp. 1097–1104.
  42. A. Swaminathan and T. Joachims, “Counterfactual risk minimization: Learning from logged bandit feedback,” in International Conference on Machine Learning.   PMLR, 2015, pp. 814–823.
  43. M. Chen, R. Gummadi, C. Harris, and D. Schuurmans, “Surrogate objectives for batch policy optimization in one-step decision making,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  44. A. Dereventsov, A. Starnes, and C. G. Webster, “Examining policy entropy of reinforcement learning agents for personalization tasks,” arXiv, 2022, submitted.
  45. L. Kantorovich, “On the transfer of masses (in russian),” Doklady Akademii Nauk, vol. 2, pp. 227–229, 1942.
  46. ——, “On the translocation of masses,” Journal of Mathematical Sciences, vol. 133, pp. 1381–1382, 2006.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube