2000 character limit reached
Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes
Published 19 Dec 2023 in cs.MA and math.OC | (2312.12325v1)
Abstract: Long-run average optimization problems for Markov decision processes (MDPs) require constructing policies with optimal steady-state behavior, i.e., optimal limit frequency of visits to the states. However, such policies may suffer from local instability, i.e., the frequency of states visited in a bounded time horizon along a run differs significantly from the limit frequency. In this work, we propose an efficient algorithmic solution to this problem.
- The Steady-State Control; Problem for Markov Decision Processes. In Proceedings of 10th Int. Conf. on Quantitative Evaluation of Systems (QEST’13), volume 8054 of Lecture Notes in Computer Science, 290–304. Springer.
- Steady-State Policy Synthesis in Multichain Markov Decision Processes. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2020), 4069–4075.
- Expected Window Mean-Payoff. In Proceedings of FST&TCS 2019, volume 150 of Leibniz International Proceedings in Informatics, 32:1–32:15. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
- Markov Decision Processes with Asymptotic Average Failure Rate Constraint. Communications in Statistics – Theory and Methods, 33(7): 1689–1714.
- Non-Ergodic Markov Decision Processes with a Constraint on the Asymptotic Failure Rate: General Class of Policies. Stochastic Models, 18(1): 173–191.
- Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes. In Proceedings of LICS 2011. IEEE Computer Society Press.
- Markov Decision Processes with Multiple Long-run Average Objectives. Logical Methods in Computer Science, 10(1): 1–29.
- Trading performance for stability in Markov decision processes. Journal of Computer and System Sciences, 84: 144–170.
- Looking at Mean-Payoff and Total-Payoff through Windows. Information and Computation, 242: 25–52.
- Adam: A Method for Stochastic Optimization. In Proceedings of ICLR 2015.
- Křetínský, J. 2021. LTL-Constrained Steady-State Policy Synthesis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2021), 4104–4111.
- Lazar, A. 1982. Optimal Flow Control of a Class of Queueing Networks in Equilibrium. IEEE Transactions on Automatic Control, 28(11): 1001–1007.
- Norris, J. 1998. Markov Chains. Cambridge University Press.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
- Puterman, M. 1994. Markov Decision Processes. Wiley.
- Skwirzynski, J. 1981. New Concepts in Multi-User Communication. Springer Science & Business Media, 43.
- Tarjan, R. 1972. Depth-First Search and Linear Graph Algorithms. SIAM Journal of Computing, 1(2).
- Velasquez, A. 2019. Steady-State Policy Synthesis for Verifiable Control. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2019), 5653–5661.
- Optimal Deterministic Controller Synthesis from Steady-State Distributions. Journal of Automated Reasoning, 67(7).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.