Predictable Interval MDPs through Entropy Regularization (2403.16711v1)
Abstract: Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to human-robot interaction, where predictable behavior is highly desirable. In this paper, we consider entropy regularization for interval Markov decision processes (IMDPs). IMDPs are uncertain MDPs, where transition probabilities are only known to belong to intervals. Lately, IMDPs have gained significant popularity in the context of abstracting stochastic systems for control design. In this work, we address robust minimization of the linear combination of entropy and a standard cumulative cost in IMDPs, thereby establishing a trade-off between optimality and predictability. We show that optimal deterministic policies exist, and devise a value-iteration algorithm to compute them. The algorithm solves a number of convex programs at each step. Finally, through an illustrative example we show the benefits of penalizing entropy in IMDPs.
- F. Biondi, A. Legay, B. F. Nielsen, and A. Wasowski, “Maximizing entropy over markov processes,” J. Log. Algebraic Methods Program., vol. 83, pp. 384–399, 2013.
- X. Duan, M. George, and F. Bullo, “Markov chains with maximum return time entropy for robotic surveillance,” IEEE Transactions on Automatic Control, vol. 65, pp. 72–86, 2018.
- M. George, S. Jafarpour, and F. Bullo, “Markov chains with maximum entropy for robotic surveillance,” IEEE Transactions on Automatic Control, vol. 64, pp. 1566–1580, 2019.
- L. Guo, H. Pan, X. Duan, and J. He, “Balancing efficiency and unpredictability in multi-robot patrolling: A marl-based approach,” 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3504–3509, 2023.
- H. Guo, Q. Kang, W.-Y. Yau, M. H. Ang, and D. Rus, “Em-patroller: Entropy maximized multi-robot patrolling with steady state distribution approximation,” IEEE Robotics and Automation Letters, vol. 8, pp. 5712–5719, 2023.
- D. J. Ornia, G. Delimpaltadakis, J. Kober, and J. Alonso-Mora, “Predictable reinforcement learning dynamics through entropy rate minimization,” arXiv preprint arXiv:2311.18703, 2024.
- B. Eysenbach, R. Salakhutdinov, and S. Levine, “Robust predictable control,” in Neural Information Processing Systems, 2021.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” ArXiv, vol. abs/1801.01290, 2018.
- T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” ArXiv, vol. abs/1812.05905, 2018.
- T. Chen and T. Han, “On the complexity of computing maximum entropy for markovian models,” in Foundations of Software Technology and Theoretical Computer Science, 2014.
- Y. Savas, M. Ornik, M. Cubuktepe, M. O. Karabag, and U. Topcu, “Entropy maximization for markov decision processes under temporal logic constraints,” IEEE Transactions on Automatic Control, vol. 65, pp. 1552–1567, 2018.
- Y. Chen, S. Li, and X. Yin, “Entropy rate maximization of markov decision processes for surveillance tasks,” IFAC-PapersOnLine, 2022.
- Y. Savas, M. Ornik, M. Cubuktepe, and U. Topcu, “Entropy maximization for constrained markov decision processes,” 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 911–918, 2018.
- R. Givan, S. M. Leach, and T. L. Dean, “Bounded-parameter markov decision processes,” Artif. Intell., vol. 122, pp. 71–109, 2000.
- E. M. Hahn, V. Hashemi, H. Hermanns, M. Lahijanian, and A. Turrini, “Interval markov decision processes with multiple objectives,” ACM Transactions on Modeling and Computer Simulation (TOMACS), vol. 29, pp. 1 – 31, 2019.
- F. B. Mathiesen, M. Lahijanian, and L. Laurenti, “Intervalmdp.jl: Accelerated value iteration for interval markov decision processes,” ArXiv, vol. abs/2401.04068, 2024.
- S. Jafarpour and S. Coogan, “A contracting dynamical system perspective toward interval markov decision processes,” 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 2918–2924, 2023.
- M. van Zutphen, W. Heemels, and D. J. Antunes, “Optimal stopping problems in low-dimensional feature spaces: Lossless conditions and approximations,” 2023 62nd IEEE Conference on Decision and Control (CDC), pp. 1776–1781, 2023.
- A. Nilim and L. E. Ghaoui, “Robust control of markov decision processes with uncertain transition matrices,” Oper. Res., vol. 53, pp. 780–798, 2005.
- S. Soudjani and A. Abate, “Aggregation and control of populations of thermostatically controlled loads by formal abstractions,” IEEE Transactions on Control Systems Technology, vol. 23, pp. 975–990, 2013.
- M. Dutreix, J. Huh, and S. Coogan, “Abstraction-based synthesis for stochastic systems with omega-regular objectives,” Nonlinear Analysis: Hybrid Systems, vol. 45, p. 101204, 2022.
- G. Delimpaltadakis, M. Lahijanian, M. Mazo, and L. Laurenti, “Interval markov decision processes with continuous action-spaces,” Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control, 2022.
- G. Delimpaltadakis, L. Laurenti, and M. Mazo, “Formal analysis of the sampling behaviour of stochastic event-triggered control,” IEEE Transactions on Automatic Control, pp. 1–15, 2023.
- M. Lahijanian, S. B. Andersson, and C. A. Belta, “Formal verification and synthesis for discrete-time stochastic systems,” IEEE Trans. Autom. Control., vol. 60, pp. 2031–2045, 2015.
- J. Jiang, Y. Zhao, and S. D. Coogan, “Safe learning for uncertainty-aware planning via interval mdp abstraction,” IEEE Control Systems Letters, vol. 6, pp. 2641–2646, 2022.