Papers
Topics
Authors
Recent
2000 character limit reached

Estimation and Inference in Distributional Reinforcement Learning (2309.17262v2)

Published 29 Sep 2023 in stat.ML and cs.LG

Abstract: In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted $\eta\pi$) attained by a given policy $\pi$. We use the certainty-equivalence method to construct our estimator $\hat\eta\pi$, given a generative model is available. In this circumstance we need a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon{2p}(1-\gamma){2p+2}}\right)$ to guarantee the $p$-Wasserstein metric between $\hat\eta\pi$ and $\eta\pi$ less than $\varepsilon$ with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon{2}(1-\gamma){4}}\right)$ suffices to ensure the Kolmogorov metric and total variation metric between $\hat\eta\pi$ and $\eta\pi$ is below $\varepsilon$ with high probability. Furthermore, we investigate the asymptotic behavior of $\hat\eta\pi$. We demonstrate that the ``empirical process'' $\sqrt{n}(\hat\eta\pi-\eta\pi)$ converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class $\ell\infty(\mathcal{F}_{\text{W}})$, also in the space of bounded functionals on indicator function class $\ell\infty(\mathcal{F}_{\text{KS}})$ and bounded measurable function class $\ell\infty(\mathcal{F}_{\text{TV}})$ when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $\eta\pi$.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 10 likes.

Upgrade to Pro to view all of the tweets about this paper: