Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces (2403.05811v4)

Published 9 Mar 2024 in stat.ML and cs.LG

Abstract: Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$. Distributional temporal difference learning has been accordingly proposed, which extends the classic temporal difference learning (TD) in RL. In this paper, we focus on the non-asymptotic statistical rates of distributional TD. To facilitate theoretical analysis, we propose non-parametric distributional TD (NTD). For a $\gamma$-discounted infinite-horizon tabular Markov decision process, we show that for NTD with a generative model, we need $\tilde{O}(\varepsilon^{{-2}\mu_{\min}^{{-1}(1-\gamma)^{-3})$}} interactions with the environment to achieve an $\varepsilon$-optimal estimator with high probability, when the estimation error is measured by the $1$-Wasserstein. This sample complexity bound is minimax optimal up to logarithmic factors. In addition, we revisit categorical distributional TD (CTD), showing that the same non-asymptotic convergence bounds hold for CTD in the case of the $1$-Wasserstein distance. We also extend our analysis to the more general setting where the data generating process is Markovian. In the Markovian setting, we propose variance-reduced variants of NTD and CTD, and show that both can achieve a $\tilde{O}(\varepsilon^{-2} \mu_{\pi,\min}^{{-1}(1-\gamma)^{{-3}+t_{mix}\mu_{\pi,\min}^{{-1}(1-\gamma)^{-1})$}}} sample complexity bounds in the case of the $1$-Wasserstein distance, which matches the state-of-the-art statistical results for classic policy evaluation. To achieve the sharp statistical rates, we establish a novel Freedman's inequality in Hilbert spaces. This new Freedman's inequality would be of independent interest for statistical analysis of various infinite-dimensional online learning problems.

References (25)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1768488338623701413

https://twitter.com/StatMLPapers/status/1767401217695654038

Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces (2403.05811v4)

Summary

Related Papers

Tweets