Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics (2105.03863v3)

Published 9 May 2021 in stat.ML and cs.LG

Abstract: In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectangular assumption, we improve their results and also consider other uncertainty sets, including $L_1$ and $\chi^2$ balls. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^{2|\mathcal{A}|}{\varepsilon^{2\rho^{2(1-\gamma)^4}\right)$.}}} In addition, we extend our results from $(s,a)$-rectangular assumption to $s$-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under $(s,a)$-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate $\sqrt{n}$ under $(s,a)$ and $s$-rectangular assumptions from both theoretical and empirical perspectives.

Authors (3)

Wenhao Yang (30 papers)
Liangyu Zhang (9 papers)
Zhihua Zhang (118 papers)

Citations (31)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics (2105.03863v3)

Summary

Related Papers