Papers
Topics
Authors
Recent
2000 character limit reached

Variance-reduced $Q$-learning is minimax optimal

Published 11 Jun 2019 in cs.LG, math.OC, and stat.ML | (1906.04697v2)

Abstract: We introduce and analyze a form of variance-reduced $Q$-learning. For $\gamma$-discounted MDPs with finite state space $\mathcal{X}$ and action space $\mathcal{U}$, we prove that it yields an $\epsilon$-accurate estimate of the optimal $Q$-function in the $\ell_\infty$-norm using $\mathcal{O} \left(\left(\frac{D}{ \epsilon2 (1-\gamma)3} \right) \; \log \left( \frac{D}{(1-\gamma)} \right) \right)$ samples, where $D = |\mathcal{X}| \times |\mathcal{U}|$. This guarantee matches known minimax lower bounds up to a logarithmic factor in the discount complexity. In contrast, our past work shows that ordinary $Q$-learning has worst-case quartic scaling in the discount complexity.

Citations (88)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.