Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reconciling $λ$-Returns with Experience Replay (1810.09967v3)

Published 23 Oct 2018 in cs.LG and stat.ML

Abstract: Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the $\lambda$-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of $\lambda$-returns. Yet replay-based methods are often the most sample efficient, and incorporating $\lambda$-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of $\lambda$-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent $\lambda$-returns can be efficiently precomputed by sharing Q-values. Computation is not wasted on experiences that are never sampled, and stored $\lambda$-returns behave as stable temporal-difference (TD) targets that replace the target network. Additionally, our method grants the unique ability to observe TD errors prior to sampling; for the first time, transitions can be prioritized by their true significance rather than by a proxy to it. Furthermore, we propose the novel use of the TD error to dynamically select $\lambda$-values that facilitate faster learning. We show that these innovations can enhance the performance of DQN when playing Atari 2600 games, even under partial observability. While our work specifically focuses on $\lambda$-returns, these ideas are applicable to any multi-step return estimator.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Brett Daley (14 papers)
  2. Christopher Amato (57 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com