Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning (2105.07253v3)

Published 15 May 2021 in cs.LG and cs.AI

Abstract: In reinforcement learning, experience replay stores past samples for further reuse. Prioritized sampling is a promising technique to better utilize these samples. Previous criteria of prioritization include TD error, recentness and corrective feedback, which are mostly heuristically designed. In this work, we start from the regret minimization objective, and obtain an optimal prioritization strategy for BeLLMan update that can directly maximize the return of the policy. The theory suggests that data with higher hindsight TD error, better on-policiness and more accurate Q value should be assigned with higher weights during sampling. Thus most previous criteria only consider this strategy partially. We not only provide theoretical justifications for previous criteria, but also propose two new methods to compute the prioritization weight, namely ReMERN and ReMERT. ReMERN learns an error network, while ReMERT exploits the temporal ordering of states. Both methods outperform previous prioritized sampling algorithms in challenging RL benchmarks, including MuJoCo, Atari and Meta-World.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xu-Hui Liu (6 papers)
  2. Zhenghai Xue (13 papers)
  3. Jing-Cheng Pang (9 papers)
  4. Shengyi Jiang (24 papers)
  5. Feng Xu (180 papers)
  6. Yang Yu (385 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.