Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions (2402.14651v2)
Abstract: In this paper, building on the formulation of quantum Markov decision processes (q-MDPs) presented in our previous work [{\sc N.~Saldi, S.~Sanjari, and S.~Y\"{u}ksel}, {\em Quantum Markov Decision Processes: General Theory, Approximations, and Classes of Policies}, SIAM Journal on Control and Optimization, 2024], our focus shifts to the development of semi-definite programming approaches for optimal policies and value functions of both open-loop and classical-state-preserving closed-loop policies. First, by using the duality between the dynamic programming and the semi-definite programming formulations of any q-MDP with open-loop policies, we establish that the optimal value function is linear and there exists a stationary optimal policy among open-loop policies. Then, using these results, we establish a method for computing an approximately optimal value function and formulate computation of optimal stationary open-loop policy as a bi-linear program. Next, we turn our attention to classical-state-preserving closed-loop policies. Dynamic programming and semi-definite programming formulations for classical-state-preserving closed-loop policies are established, where duality of these two formulations similarly enables us to prove that the optimal policy is linear and there exists an optimal stationary classical-state-preserving closed-loop policy. Then, similar to the open-loop case, we establish a method for computing the optimal value function and pose computation of optimal stationary classical-state-preserving closed-loop policies as a bi-linear program.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.