Personalized Reinforcement Learning with a Budget of Policies (2401.06514v1)
Abstract: Personalization in ML tailors models' decisions to the individual characteristics of users. While this approach has seen success in areas like recommender systems, its expansion into high-stakes fields such as healthcare and autonomous driving is hindered by the extensive regulatory approval processes involved. To address this challenge, we propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints. In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies. Our objective is twofold: efficiently match each user to an appropriate representative policy and simultaneously optimize these policies to maximize overall social welfare. We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs. These algorithms draw inspiration from the principles of classic K-means clustering and are underpinned by robust theoretical foundations. Our empirical investigations, conducted across a variety of simulated environments, showcase the algorithms' ability to facilitate meaningful personalization even under constrained policy budgets. Furthermore, they demonstrate scalability, efficiently adapting to larger policy budgets.
- MO-Gym: A Library of Multi-Objective Reinforcement Learning Environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022.
- OR Forum—A POMDP approach to personalize mammography screening decisions. Operations Research, 60(5): 1019–1034.
- Learning all optimal policies with multiple criteria. In Proceedings of the 25th international conference on Machine learning, 41–47.
- Adapting Reinforcement Learning Treatment Policies Using Limited Data to Personalize Critical Care. INFORMS Journal on Data Science, 1(1): 27–49.
- A randomized trial of closed-loop control in children with type 1 diabetes. New England Journal of Medicine, 383(9): 836–845.
- Personalized robo-advising: Enhancing investment through client interaction. Management Science, 68(4): 2485–2512.
- Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1): 1–22.
- Reinforcement learning for personalization: A systematic literature review. Data Science, 3(2): 107–147.
- Algorithms and learning for fair portfolio design. In Proceedings of the 22nd ACM Conference on Economics and Computation, 371–389.
- pH-RL: A personalization architecture to bring reinforcement learning to health practice. In Machine Learning, Optimization, and Data Science: 7th International Conference, LOD 2021, Grasmere, UK, October 4–8, 2021, Revised Selected Papers, Part I, 265–280. Springer.
- End-to-end Personalization of Digital Health Interventions using Raw Sensor Data with Deep Reinforcement Learning. In IEEE/WIC/ACM International Conference on Web Intelligence, 258–264.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, 1126–1135. PMLR.
- Cluster-based social reinforcement learning. arXiv preprint arXiv:2003.00627.
- Exploring clustering techniques for effective reinforcement learning based personalization for health and wellbeing. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 813–820. IEEE.
- Safe exploration for reinforcement learning. In ESANN, 143–148. Citeseer.
- Personalization of health interventions using cluster-based reinforcement learning. In PRIMA 2018: Principles and Practice of Multi-Agent Systems: 21st International Conference, Tokyo, Japan, October 29-November 2, 2018, Proceedings 21, 467–475. Springer.
- A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1): 1–59.
- The 37 Implementation Details of Proximal Policy Optimization. In ICLR Blog Track.
- JDRF. 2022. FDA Authorizes a Fourth Artificial Pancreas System. https://www.jdrf.org/blog/2022/01/28/fda-authorizes-a-fourth-artificial-pancreas-system/. Accessed: 2024-01-11.
- Safety-constrained reinforcement learning for MDPs. In Tools and Algorithms for the Construction and Analysis of Systems: 22nd International Conference, TACAS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, 130–146. Springer.
- Feature-based Individual Fairness in k-clustering. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2772–2774.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Prea: Personalized recommendation algorithms toolkit. The Journal of Machine Learning Research, 13(1): 2699–2703.
- Littman, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, 157–163. Elsevier.
- Lloyd, S. 1982. Least squares quantization in PCM. IEEE transactions on information theory, 28(2): 129–137.
- MacQueen, J. 1967. Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, 281–297.
- Socially fair reinforcement learning. arXiv preprint arXiv:2208.12584.
- AgentX: Using reinforcement learning to improve the effectiveness of intelligent tutoring systems. In Intelligent Tutoring Systems: 7th International Conference, ITS 2004, Maceió, Alagoas, Brazil, August 30-September 3, 2004. Proceedings 7, 564–572. Springer.
- Safe exploration in markov decision processes. arXiv preprint arXiv:1205.4810.
- Federated reinforcement learning for fast personalization. In 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 123–127. IEEE.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3(Dec): 803–832.
- Trust region policy optimization. In International conference on machine learning, 1889–1897. PMLR.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Personalized recommendation in social tagging systems using hierarchical clustering. In Proceedings of the 2008 ACM conference on Recommender systems, 259–266.
- Narrowing reinforcement learning: Overcoming the cold start problem for personalized health interventions. In PRIMA 2018: Principles and Practice of Multi-Agent Systems: 21st International Conference, Tokyo, Japan, October 29-November 2, 2018, Proceedings 21, 312–327. Springer.
- Deepmind control suite. arXiv preprint arXiv:1801.00690.
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026–5033. IEEE.
- dm__\__control: Software and tasks for continuous control. Software Impacts, 6: 100022.
- Personalized news recommendation: Methods and Challenges. ACM Transactions on Information Systems, 41(1): 1–50.
- RLPer: A reinforcement learning model for personalized search. In Proceedings of The Web Conference 2020, 2298–2308.
- Group-driven reinforcement learning for personalized mhealth intervention. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I, 590–598. Springer.