Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Solving Long-run Average Reward Robust MDPs via Stochastic Games (2312.13912v2)

Published 21 Dec 2023 in cs.AI

Abstract: Markov decision processes (MDPs) provide a standard framework for sequential decision making under uncertainty. However, MDPs do not take uncertainty in transition probabilities into account. Robust Markov decision processes (RMDPs) address this shortcoming of MDPs by assigning to each transition an uncertainty set rather than a single probability value. In this work, we consider polytopic RMDPs in which all uncertainty sets are polytopes and study the problem of solving long-run average reward polytopic RMDPs. We present a novel perspective on this problem and show that it can be reduced to solving long-run average reward turn-based stochastic games with finite state and action spaces. This reduction allows us to derive several important consequences that were hitherto not known to hold for polytopic RMDPs. First, we derive new computational complexity bounds for solving long-run average reward polytopic RMDPs, showing for the first time that the threshold decision problem for them is in $NP \cap coNP$ and that they admit a randomized algorithm with sub-exponential expected runtime. Second, we present Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is much more efficient in solving long-run average reward polytopic RMDPs compared to state-of-the-art methods based on value iteration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. The complexity of solving stochastic games on graphs. In Yingfei Dong, Ding-Zhu Du, and Oscar Ibarra, editors, Algorithms and Computation, pages 112–121, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
  2. Stochastic games. In Nathanaël Fijalkow, editor, Games on Graphs. 2023.
  3. A. Condon. The Complexity of Stochastic Games. Information and Computation, 96(2):203–224, 1992.
  4. J. Filar and K. Vrieze. Competitive Markov Decision Processes. 1996.
  5. Safe policy improvement by minimizing robust baseline regret. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 2298–2306, 2016.
  6. G. Gillette. Stochastic Games with Zero Stop Probabilities. Contributions to the Theory of Games, vol III, pages 179–187, 1957.
  7. Bounded-parameter markov decision processes. Artif. Intell., 122(1-2):71–109, 2000.
  8. Robust markov decision processes: Beyond rectangularity. Math. Oper. Res., 48(1):203–226, 2023.
  9. Reducing blackwell and average optimality to discounted mdps via the blackwell discount factor. CoRR, abs/2302.00036, 2023.
  10. Beyond discounted returns: Robust markov decision processes with average and blackwell optimality. arXiv preprint arXiv:2312.03618, 2023.
  11. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM, 60(1):1:1–1:16, 2013.
  12. The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf., 24(4):589–610, 2022.
  13. Fast bellman updates for robust mdps. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1984–1993. PMLR, 2018.
  14. Partial policy iteration for l1-robust markov decision processes. J. Mach. Learn. Res., 22:275:1–275:46, 2021.
  15. Garud N. Iyengar. Robust dynamic programming. Math. Oper. Res., 30(2):257–280, 2005.
  16. Robust modified policy iteration. INFORMS J. Comput., 25(3):396–410, 2013.
  17. A. Kučera. Turn-Based Stochastic Games. In K.R. Apt, E. Grädel (Eds.): Lectures in Game Theory for Computer Scientists, pages 146–184. Cambridge University Press, 2011.
  18. Stochastic Games with Perfect Information and Time Average Payoff. SIAM Review, 11(4):604–607, 1969.
  19. W. Ludwig. A Subexponential Randomized Algorithm for the Simple Stochastic Game Problem. 117(1):151–155, 1995.
  20. Robustness in markov decision problems with uncertain transition matrices. In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], pages 839–846. MIT Press, 2003.
  21. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994.
  22. Reinforcement learning under model mismatch. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 3043–3052, 2017.
  23. L.S. Shapley. Stochastic Games. Proceedings of the National Academy of Sciences, 39:1095–1100, 1953.
  24. Policy gradient methods for reinforcement learning with function approximation. In Sara A. Solla, Todd K. Leen, and Klaus-Robert Müller, editors, Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999], pages 1057–1063. The MIT Press, 1999.
  25. Scaling up robust mdps using function approximation. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, volume 32 of JMLR Workshop and Conference Proceedings, pages 181–189. JMLR.org, 2014.
  26. Action robust reinforcement learning and applications in continuous control. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6215–6224. PMLR, 2019.
  27. Bounded parameter markov decision processes with average reward criterion. In Nader H. Bshouty and Claudio Gentile, editors, Learning Theory, 20th Annual Conference on Learning Theory, COLT 2007, San Diego, CA, USA, June 13-15, 2007, Proceedings, volume 4539 of Lecture Notes in Computer Science, pages 263–277. Springer, 2007.
  28. Gymnasium, March 2023.
  29. Online robust reinforcement learning with model uncertainty. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 7193–7206, 2021.
  30. Policy gradient method for robust reinforcement learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 23484–23526. PMLR, 2022.
  31. Robust average-reward markov decision processes. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 15215–15223. AAAI Press, 2023.
  32. Robust markov decision processes. Math. Oper. Res., 38(1):153–183, 2013.
  33. David Williams. Probability with Martingales. Cambridge mathematical textbooks. Cambridge University Press, 1991.
  34. Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics. The Annals of Statistics, 50(6):3223–3248, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Krishnendu Chatterjee (214 papers)
  2. Ehsan Kafshdar Goharshady (12 papers)
  3. Mehrdad Karrabi (7 papers)
  4. Petr Novotný (41 papers)
  5. Đorđe Žikelić (31 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets