Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance Properties (2405.04015v1)
Abstract: Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques. In this work, we consider the problems of certified policy (i.e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.
- Approximate verification of the symbolic dynamics of markov chains. J. ACM, 62(1):2:1–2:34, 2015.
- Some applications of polynomial optimization in operations research and real-time decision making. Optim. Lett., 10(4):709–729, 2016.
- Reachability problems for markov chains. Inf. Process. Lett., 115(2):155–158, 2015.
- Distribution-based objectives for markov decision processes. In LICS, pages 36–45. ACM, 2018.
- Mdps as distribution transformers: Affine invariant synthesis for safety objectives. In CAV (3), volume 13966 of Lecture Notes in Computer Science, pages 86–112. Springer, 2023.
- Safe reinforcement learning via shielding. In AAAI, pages 2669–2678. AAAI Press, 2018.
- Polynomial reachability witnesses via stellensätze. In PLDI, pages 772–787. ACM, 2021.
- Sampling-based robust control of autonomous systems with non-gaussian noise. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 9669–9678. AAAI Press, 2022.
- Probabilities are not enough: Formal controller synthesis for stochastic dynamical models with epistemic uncertainty. In AAAI, pages 14701–14710. AAAI Press, 2023.
- Principles of model checking. MIT Press, 2008.
- On the solvability of anonymous partial grids exploration by mobile robots. In OPODIS, volume 5401 of Lecture Notes in Computer Science, pages 428–445. Springer, 2008.
- A logic of probability with decidable model checking. J. Log. Comput., 16(4):461–487, 2006.
- Stochy-automated verification and synthesis of stochastic processes. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 258–259, 2019.
- Model checking mdps with a unique compact invariant set of distributions. In Eighth International Conference on Quantitative Evaluation of Systems, QEST 2011, Aachen, Germany, 5-8 September, 2011, pages 121–130. IEEE Computer Society, 2011.
- The MathSAT5 SMT Solver. In Nir Piterman and Scott Smolka, editors, Proceedings of TACAS, volume 7795 of LNCS. Springer, 2013.
- Synthesis of linear ranking functions. In TACAS, volume 2031 of Lecture Notes in Computer Science, pages 67–81. Springer, 2001.
- Linear invariant generation using non-linear constraint solving. In CAV, volume 2725 of Lecture Notes in Computer Science, pages 420–432. Springer, 2003.
- Z3: an efficient SMT solver. In TACAS, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008.
- Limit synchronization in markov decision processes. In FoSSaCS, volume 8412 of Lecture Notes in Computer Science, pages 58–72. Springer, 2014.
- Bruno Dutertre. Yices 2.2. In CAV, volume 8559 of Lecture Notes in Computer Science, pages 737–744. Springer, 2014.
- Julius Farkas. Theorie der einfachen ungleichungen. Journal für die reine und angewandte Mathematik (Crelles Journal), 1902(124):1–27, 1902.
- Pysmt: a solver-agnostic library for fast prototyping of smt-based algorithms. In SMT workshop, volume 2015, 2015.
- Anytime guarantees for reachability in uncountable markov decision processes. In CONCUR, volume 243 of LIPIcs, pages 11:1–11:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
- Program analysis as constraint solving. In PLDI, pages 281–292. ACM, 2008.
- Sliding window abstraction for infinite markov chains. In CAV, volume 5643 of Lecture Notes in Computer Science, pages 337–352. Springer, 2009.
- Safe reinforcement learning using probabilistic shields (invited paper). In CONCUR, volume 171 of LIPIcs, pages 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
- Some controls applications of sum of squares programming. In 42nd IEEE international conference on decision and control (IEEE Cat. No. 03CH37475), volume 5, pages 4676–4681. IEEE, 2003.
- Reasoning about mdps as transformers of probability distributions. In QEST, pages 199–208. IEEE Computer Society, 2010.
- Stochastic model checking. In SFM, volume 4486 of Lecture Notes in Computer Science, pages 220–270. Springer, 2007.
- Verifying the evolution of probability distributions governed by a DTMC. IEEE Trans. Software Eng., 37(1):126–141, 2011.
- AMYTISS: parallelized automated controller synthesis for large-scale stochastic systems. In CAV (2), volume 12225 of Lecture Notes in Computer Science, pages 461–474. Springer, 2020.
- Stability verification in stochastic control systems via neural network supermartingales. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 7326–7336. AAAI Press, 2022.
- On the skolem problem and the skolem conjecture. In LICS, pages 5:1–5:9. ACM, 2022.
- Sympy: symbolic computing in python. PeerJ Comput. Sci., 3:e103, 2017.
- Decision problems for linear recurrence sequences. In RP, volume 7550 of Lecture Notes in Computer Science, pages 21–28. Springer, 2012.
- A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control., 52(8):1415–1428, 2007.
- Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994.
- FAUST 2 2{}^{\mbox{ 2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT : Formal abstractions of uncountable-state stochastic processes. In TACAS, volume 9035 of Lecture Notes in Computer Science, pages 272–286. Springer, 2015.
- Reach-avoid analysis for stochastic discrete-time systems. In ACC, pages 4879–4885. IEEE, 2021.
- Learning control policies for stochastic systems with reach-avoid guarantees. In AAAI, pages 11926–11935. AAAI Press, 2023.
- Compositional policy learning in stochastic control systems with formal guarantees. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.