Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Swapped goal-conditioned offline reinforcement learning (2302.08865v1)

Published 17 Feb 2023 in cs.LG and cs.AI

Abstract: Offline goal-conditioned reinforcement learning (GCRL) can be challenging due to overfitting to the given dataset. To generalize agents' skills outside the given dataset, we propose a goal-swapping procedure that generates additional trajectories. To alleviate the problem of noise and extrapolation errors, we present a general offline reinforcement learning method called deterministic Q-advantage policy gradient (DQAPG). In the experiments, DQAPG outperforms state-of-the-art goal-conditioned offline RL methods in a wide range of benchmark tasks, and goal-swapping further improves the test results. It is noteworthy, that the proposed method obtains good performance on the challenging dexterous in-hand manipulation tasks for which the prior methods failed.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wenyan Yang (10 papers)
  2. Huiling Wang (8 papers)
  3. Dingding Cai (7 papers)
  4. Joni Pajarinen (68 papers)
  5. Joni-Kristen Kämäräinen (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.