Tight sample complexities for RMDPs under general uncertainty sets

Establish tight sample complexity characterizations for learning distributionally robust Markov decision processes across broad families of uncertainty sets beyond total variation distance, including divergences such as chi-squared, Lp, and KL.

Background

The tutorial analyzes distributionally robust MDPs (RMDPs) and provides tight, matching upper and lower bounds on sample complexity under total variation (TV) uncertainty sets, showing that learning RMDPs can be at least as easy as standard MDPs in certain regimes.

However, the behavior under other divergence-based uncertainty sets (e.g., chi-squared, Lp, KL) is more nuanced, with evidence that RMDPs can be harder than standard MDPs for some ranges of the uncertainty level. The open problem seeks tight, general sample complexity results across these broader families.

References

It remains an interesting open question to establish tight sample complexities of RMDPs over broad families of uncertainty sets.

Statistical and Algorithmic Foundations of Reinforcement Learning (2507.14444 - Chi et al., 19 Jul 2025) in Section 7 (Distributionally robust RL), Discussion: other uncertainty sets