Lifting the polytopic restriction via near‑optimal policies in arbitrary RMDPs

Investigate whether the restriction to polytopic uncertainty sets in the implicit anytime algorithm for robust Markov decision processes without the Constant-Support Assumption can be lifted by characterizing near‑optimal policies in arbitrary uncertainty sets so as to obtain the stable strategy recommender required for the algorithm’s correctness.

Background

To ensure convergence of the SEC-based deflation/inflation method, the algorithm requires a stable strategy recommender, which in turn depends on the existence of memoryless deterministic optimal policies. This is guaranteed for polytopic uncertainty sets but may fail for arbitrary uncertainty sets without Constant-Support.

The authors therefore currently restrict to polytopic RMDPs in this part of their theory and propose lifting this restriction by analyzing the structure of near-optimal policies in arbitrary RMDPs.

References

We leave lifting this restriction by analyzing the structure of near-optimal policies in arbitrary RMDPs as future work.

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient  (2412.10185 - Meggendorfer et al., 2024) in Appendix F (Implicit Anytime Algorithm for RMDPs without Constant-Support), Proof of Theorem app-anytime, discussion of Line ‘Infer environment policy’