Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

Published 15 Jul 2020 in cs.LG and stat.ML | (2007.07582v1)

Abstract: In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a finite number of transitions. We represent these transitions in a data graph and link its structure to soft divergence. By selecting a subgraph with a favorable structure, we construct a simplified Markov Decision Process for which exact Q-values can be computed efficiently as more data comes in. The subgraph and its associated Q-values can be represented as a QGraph. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem. By using these lower bounds in temporal difference learning, our method QG-DDPG is less prone to soft divergence and exhibits increased sample efficiency while being more robust to hyperparameters. QGraphs also retain information from transitions that have already been overwritten in the replay memory, which can decrease the algorithm's sensitivity to the replay memory capacity.

Abstract PDF Upgrade to Chat

Citations (7)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (2)

Collections

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections