Papers
Topics
Authors
Recent
Search
2000 character limit reached

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

Published 21 Feb 2024 in cs.LG | (2402.13821v1)

Abstract: Configurable Markov Decision Processes (Conf-MDPs) have recently been introduced as an extension of the traditional Markov Decision Processes (MDPs) to model the real-world scenarios in which there is the possibility to intervene in the environment in order to configure some of its parameters. In this paper, we focus on a particular subclass of Conf-MDP that satisfies regularity conditions, namely Lipschitz continuity. We start by providing a bound on the Wasserstein distance between $\gamma$-discounted stationary distributions induced by changing policy and configuration. This result generalizes the already existing bounds both for Conf-MDPs and traditional MDPs. Then, we derive a novel performance improvement lower bound.

Authors (1)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Constrained policy optimization. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 22–31.
  2. Lipschitz continuity in model-based reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 264–273.
  3. Approximately optimal approximate reinforcement learning. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML), pages 267–274. Morgan Kaufmann.
  4. Metelli, A. M. (2021). Exploiting Environment Configurability in Reinforcement Learning. PhD thesis, Politecnico di Milano.
  5. Metelli, A. M. (2022). Configurable environments in reinforcement learning: An overview. Special Topics in Information Technology, pages 101–113.
  6. Reinforcement learning in configurable continuous environments. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 4546–4555.
  7. Policy space identification in configurable environments. Mach. Learn., 111(6):2093–2145.
  8. Control frequency adaptation via action persistence in batch reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 6862–6873.
  9. Configurable markov decision processes. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 3488–3497.
  10. Safe policy iteration: A monotonically improving approximate policy iteration approach. J. Mach. Learn. Res., 22:97:1–97:83.
  11. Finite-time bounds for fitted value iteration. J. Mach. Learn. Res., 9:815–857.
  12. Policy gradient in lipschitz markov decision processes. Mach. Learn., 100(2-3):255–283.
  13. Safe policy iteration. In Proceedings of the 30th International Conference on Machine Learning (ICML), pages 307–315.
  14. Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley.
  15. On the locality of action domination in sequential decision making. In International Symposium on Artificial Intelligence and Mathematics, ISAIM 2010.
  16. Learning in non-cooperative configurable markov decision processes. In Advances in Neural Information Processing Systems 34 (NeurIPS), pages 22808–22821.
  17. Truly deterministic policy optimization. CoRR, abs/2205.15379.
  18. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pages 1889–1897.
  19. Reinforcement learning: An introduction. MIT press.
  20. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, (NIPS), pages 1057–1063. The MIT Press.
  21. Villani, C. (2009). Optimal transport: old and new, volume 338. Springer.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.