Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Robust Policies for Uncertain Parametric Markov Decision Processes (2312.06344v2)

Published 11 Dec 2023 in eess.SY, cs.LO, and cs.SY

Abstract: Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Contributions to the Theory of Games (AM-28), Volume II. Princeton University Press, 1953. ISBN 978-0-691-07935-6.
  2. Mean Value Property and Subdifferential Criteria for Lower Semicontinuous Functions. Transactions of the American Mathematical Society, 347(10):4147–4161, 1995. ISSN 0002-9947.
  3. Scenario-based verification of uncertain parametric MDPs. Int. J. Softw. Tools Technol. Transf., 24(5):803–819, 2022.
  4. Robust Control for Dynamical Systems with Non-Gaussian Noise via Formal Abstractions. J. Artif. Intell. Res., 76:341–391, 2023.
  5. Principles of Model Checking. MIT Press, 2008. ISBN 978-0-262-02649-9.
  6. Formal Methods for Discrete-Time Dynamical Systems, volume 89 of Studies in Systems, Decision and Control. Springer International Publishing, Cham, 2017. ISBN 978-3-319-50762-0 978-3-319-50763-7. 10.1007/978-3-319-50763-7.
  7. Global optimality guarantees for policy gradient methods. CoRR, abs/1906.01786, 2019.
  8. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annu. Rev. Control. Robotics Auton. Syst., 5:411–444, 2022.
  9. The scenario approach to robust control design. IEEE Trans. Autom. Control., 51(5):742–753, 2006.
  10. A Sampling-and-Discarding Approach to Chance-Constrained Optimization: Feasibility and Optimality. J. Optim. Theory Appl., 148(2):257–280, 2011.
  11. Wait-and-judge scenario optimization. Math. Program., 167(1):155–189, 2018.
  12. The scenario approach for systems and control design. Annu. Rev. Control., 33(2):149–157, 2009. 10.1016/j.arcontrol.2009.07.001.
  13. A General Scenario Theory for Nonconvex Optimization and Decision Making. IEEE Trans. Autom. Control., 63(12):4067–4078, 2018.
  14. Verification of Hybrid Systems Based on Counterexample-Guided Abstraction Refinement. In Hubert Garavel and John Hatcliff, editors, Tools and Algorithms for the Construction and Analysis of Systems, 9th International Conference, TACAS 2003, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2003, Warsaw, Poland, April 7-11, 2003, Proceedings, volume 2619 of Lecture Notes in Computer Science, pages 192–207. Springer, 2003. 10.1007/3-540-36577-X_14.
  15. Convex Optimization for Parameter Synthesis in MDPs. IEEE Trans. Autom. Control., 67(12):6333–6348, 2022.
  16. Conrado Daws. Symbolic and Parametric Model Checking of Discrete-Time Markov Chains. In ICTAC, volume 3407 of Lecture Notes in Computer Science, pages 280–294. Springer, 2004a.
  17. Conrado Daws. Symbolic and Parametric Model Checking of Discrete-Time Markov Chains. In ICTAC, volume 3407 of Lecture Notes in Computer Science, pages 280–294. Springer, 2004b.
  18. F. G. Foster. Dynamic Programming and Markov Processes. By R. A. Howard. Pp. 136. 46s. 1960. (John Wiley and Sons, N.Y.). The Mathematical Gazette, 46(358):340–341, December 1962. ISSN 0025-5572, 2056-6328.
  19. Nash equilibrium seeking in noncooperative games. IEEE Trans. Autom. Control., 57(5):1192–1207, 2012.
  20. Risk and complexity in scenario optimization. Math. Program., 191(1):243–279, 2022. 10.1007/s10107-019-01446-4.
  21. Synthesis for PCTL in Parametric Markov Decision Processes. In Mihaela Gheorghiu Bobaru, Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi, editors, NASA Formal Methods - Third International Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Proceedings, volume 6617 of Lecture Notes in Computer Science, pages 146–161. Springer, 2011a. 10.1007/978-3-642-20398-5_12.
  22. Probabilistic reachability for parametric Markov models. Int. J. Softw. Tools Technol. Transf., 13(1):3–19, 2011b.
  23. A logic for reasoning about time and reliability. Formal Aspects of Computing, 6(5):512–535, September 1994. ISSN 1433-299X. 10.1007/BF01211866.
  24. Fictitious Self-Play in Extensive-Form Games. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 805–813. JMLR.org, 2015.
  25. The probabilistic model checker Storm. Int. J. Softw. Tools Technol. Transf., 24(4):589–610, 2022.
  26. Garud N. Iyengar. Robust Dynamic Programming. Math. Oper. Res., 30(2):257–280, 2005.
  27. Parameter Synthesis for Markov Models. CoRR, abs/1903.07993, 2019.
  28. Krzysztof C. Kiwiel. Convergence and efficiency of subgradient methods for quasiconvex minimization. Math. Program., 90(1):1–25, 2001.
  29. John C. Knight. Safety critical systems: Challenges and directions. In Will Tracz, Michal Young, and Jeff Magee, editors, Proceedings of the 24th International Conference on Software Engineering, ICSE 2002, 19-25 May 2002, Orlando, Florida, USA, pages 547–550. ACM, 2002. 10.1145/581339.581406.
  30. Interval-Valued Finite Markov Chains. Reliab. Comput., 8(2):97–113, 2002.
  31. PRISM 4.0: Verification of probabilistic real-time systems. In CAV, volume 6806 of Lecture Notes in Computer Science, pages 585–591. Springer, 2011.
  32. Evaluating probabilistic models with uncertain model parameters. Softw. Syst. Model., 13(4):1395–1415, 2014.
  33. John Nash. Non-cooperative games. Cournot Oligopoly, pages 82–94, January 1989. 10.1017/CBO9780511528231.007.
  34. Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Oper. Res., 53(5):780–798, 2005.
  35. Deep exploration via randomized value functions. J. Mach. Learn. Res., 20:124:1–124:62, 2019.
  36. André Platzer. Logics of Dynamical Systems. In Proceedings of the 27th Annual IEEE Symposium on Logic in Computer Science, LICS 2012, Dubrovnik, Croatia, June 25-28, 2012, pages 13–24. IEEE Computer Society, 2012. 10.1109/LICS.2012.13.
  37. Simple search methods for finding a Nash equilibrium. Games Econ. Behav., 63(2):642–662, 2008.
  38. Polynomial-Time Verification of PCTL Properties of MDPs with Convex Uncertainties. In CAV, volume 8044 of Lecture Notes in Computer Science, pages 527–542. Springer, 2013.
  39. Parameter synthesis for markov models: Faster than ever. In ATVA, volume 9938 of Lecture Notes in Computer Science, pages 50–67, 2016.
  40. Multiple-Environment Markov Decision Processes. In FSTTCS, volume 29 of LIPIcs, pages 531–543. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014.
  41. Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics, May 2023.
  42. Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters. In VALUETOOLS, pages 44–51. ACM, 2017.
  43. H. Von. Stackelberg. The Theory of Market Economy. Translated from the German and with an Introduction by A.T. Peacock. London, Edinburgh, Glasgow, W. Hodge & Co, Ltd., 1952, xxiii p. 328 p., 25/-. Recherches Économiques de Louvain/ Louvain Economic Review, 18(5):543–543, 1952. ISSN 1373-9719. 10.1017/S0770451800047382.
  44. Robust Almost-Sure Reachability in Multi-Environment MDPs. In TACAS (1), volume 13993 of Lecture Notes in Computer Science, pages 508–526. Springer, 2023.
  45. Robust Markov Decision Processes. Math. Oper. Res., 38(1):153–183, 2013.
  46. Strategic rationing in Stackelberg games. Games Econ. Behav., 140:529–555, 2023.
  47. Distributed motion coordination for multi-robot systems under LTL specifications, March 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.