Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AWD3: Dynamic Reduction of the Estimation Bias (2111.06780v1)

Published 12 Nov 2021 in cs.LG and cs.AI

Abstract: Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Dogan C. Cicek (6 papers)
  2. Enes Duran (4 papers)
  3. Baturay Saglam (12 papers)
  4. Kagan Kaya (1 paper)
  5. Furkan B. Mutlu (7 papers)
  6. Suleyman S. Kozat (50 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.