Papers
Topics
Authors
Recent
2000 character limit reached

Convergence of Q-value in case of Gaussian rewards

Published 7 Mar 2020 in math.OC, cs.LG, and stat.ML | (2003.03526v1)

Abstract: In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.