Variance-reduced $Q$-learning is minimax optimal (1906.04697v2)

Published 11 Jun 2019 in cs.LG, math.OC, and stat.ML

Abstract: We introduce and analyze a form of variance-reduced $Q$-learning. For $\gamma$-discounted MDPs with finite state space $\mathcal{X}$ and action space $\mathcal{U}$, we prove that it yields an $\epsilon$-accurate estimate of the optimal $Q$-function in the $\ell_\infty$-norm using $\mathcal{O} \left(\left(\frac{D}{ \epsilon² (1-\gamma)^3} \right) \; \log \left( \frac{D}{(1-\gamma)} \right) \right)$ samples, where $D = |\mathcal{X}| \times |\mathcal{U}|$. This guarantee matches known minimax lower bounds up to a logarithmic factor in the discount complexity. In contrast, our past work shows that ordinary $Q$-learning has worst-case quartic scaling in the discount complexity.

Citations (88)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Variance-reduced $Q$-learning is minimax optimal (1906.04697v2)

Summary

Related Papers