Online Model-free Safety Verification for Markov Decision Processes Without Safety Violation (2312.05243v1)
Abstract: In this paper, we consider the problem of safety assessment for Markov decision processes without explicit knowledge of the model. We aim to learn probabilistic safety specifications associated with a given policy without compromising the safety of the process. To accomplish our goal, we characterize a subset of the state-space called proxy set, which contains the states that are near in a probabilistic sense to the forbidden set consisting of all unsafe states. We compute the safety function using the single-step temporal difference method. To this end, we relate the safety function computation to that of the value function estimation using temporal difference learning. Since the given control policy could be unsafe, we use a safe baseline subpolicy to generate data for learning. We then use an off-policy temporal difference learning method with importance sampling to learn the safety function corresponding to the given policy. Finally, we demonstrate our results using a numerical example
- S. Prajna and A. Jadbabaie, “Safety verification of hybrid systems using barrier certificates,” in International Workshop on Hybrid Systems: Computation and Control. Springer, 2004, pp. 477–492.
- A. Chutinan and B. H. Krogh, “Computational techniques for hybrid system verification,” IEEE transactions on automatic control, vol. 48, no. 1, pp. 64–75, 2003.
- C. Sloth, G. J. Pappas, and R. Wisniewski, “Compositional safety analysis using barrier certificates,” in Proceedings of the 15th ACM international conference on Hybrid Systems: Computation and Control, 2012, pp. 15–24.
- M. L. Bujorianu and J. Lygeros, “Reachability questions in piecewise deterministic markov processes,” in Hybrid Systems: Computation and Control: 6th International Workshop, HSCC 2003 Prague, Czech Republic, April 3–5, 2003 Proceedings 6. Springer, 2003, pp. 126–140.
- M. L. Bujorianu, “Extended stochastic hybrid systems and their reachability problem,” in Hybrid Systems: Computation and Control: 7th International Workshop, HSCC 2004, Philadelphia, PA, USA, March 25-27, 2004. Proceedings 7. Springer, 2004, pp. 234–249.
- S. Prajna, A. Jadbabaie, and G. J. Pappas, “A framework for worst-case and stochastic safety verification using barrier certificates,” IEEE Transactions on Automatic Control, vol. 52, no. 8, pp. 1415–1428, 2007.
- R. Wisniewski and L.-M. Bujorianu, “Safety of stochastic systems: An analytic and computational approach,” Automatica, vol. 133, p. 109839, 2021.
- R. Wisniewski and M. L. Bujorianu, “Probabilistic safety guarantees for Markov decision processes,” IEEE Transactions on Automatic Control, 2023.
- D. Althoff, M. Althoff, and S. Scherer, “Online safety verification of trajectories for unmanned flight with offline computed robust invariant sets,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015, pp. 3470–3477.
- F. Gruber and M. Althoff, “Anytime safety verification of autonomous vehicles,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 1708–1714.
- A. G. Taye, J. Bertram, C. Fan, and P. Wei, “Reachability based online safety verification for high-density urban air mobility trajectory planning,” in AIAA AVIATION 2022 Forum, 2022, p. 3542.
- A. Lavaei, A. Nejati, P. Jagtap, and M. Zamani, “Formal safety verification of unknown continuous-time systems: a data-driven approach,” in Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, 2021, pp. 1–2.
- A. Salamati, A. Lavaei, S. Soudjani, and M. Zamani, “Data-driven verification and synthesis of stochastic systems through barrier certificates,” arXiv preprint arXiv:2111.10330, 2021.
- N. Noroozi, A. Salamati, and M. Zamani, “Data-driven safety verification of discrete-time networks: A compositional approach,” IEEE Control Systems Letters, vol. 6, pp. 2210–2215, 2021.
- A. Salamati and M. Zamani, “Data-driven safety verification of stochastic systems via barrier certificates: A wait-and-judge approach,” in Learning for Dynamics and Control Conference. PMLR, 2022, pp. 441–452.
- ——, “Safety verification of stochastic systems: A repetitive scenario approach,” IEEE Control Systems Letters, vol. 7, pp. 448–453, 2022.
- A. Mazumdar, R. Wisniewski, and M. L. Bujorianu, “Online learning of safety function for Markov decision processes,” in European Control Conference (ECC), 2023. IEEE, 2023, pp. 1–6.
- E. Graves and S. Ghiassian, “Importance sampling placement in off-policy temporal-difference methods,” arXiv preprint arXiv:2203.10172, 2022.
- D. Precup, R. S. Sutton, and S. P. Singh, “Eligibility traces for off-policy policy evaluation,” in Proceedings of the Seventeenth International Conference on Machine Learning, 2000, pp. 759–766.
- J. N. Tsitsiklis, “Asynchronous stochastic approximation and Q-learning,” Machine learning, vol. 16, pp. 185–202, 1994.