Lifting the polytopic restriction via near‑optimal policies in arbitrary RMDPs
Investigate whether the restriction to polytopic uncertainty sets in the implicit anytime algorithm for robust Markov decision processes without the Constant-Support Assumption can be lifted by characterizing near‑optimal policies in arbitrary uncertainty sets so as to obtain the stable strategy recommender required for the algorithm’s correctness.
References
We leave lifting this restriction by analyzing the structure of near-optimal policies in arbitrary RMDPs as future work.
— Solving Robust Markov Decision Processes: Generic, Reliable, Efficient
(2412.10185 - Meggendorfer et al., 2024) in Appendix F (Implicit Anytime Algorithm for RMDPs without Constant-Support), Proof of Theorem app-anytime, discussion of Line ‘Infer environment policy’