Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions (2303.17396v1)

Published 30 Mar 2023 in cs.LG

Abstract: Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we ideally finetune agents obtained from offline RL training? While offline RL algorithms can in principle be used for finetuning, in practice, their online performance improves slowly. In contrast, we show that it is possible to use standard online off-policy algorithms for faster improvement. However, we find this approach may suffer from policy collapse, where the policy undergoes severe performance deterioration during initial online learning. We investigate the issue of policy collapse and how it relates to data diversity, algorithm choices and online replay distribution. Based on these insights, we propose a conservative policy optimization procedure that can achieve stable and sample-efficient online learning from offline pretraining.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yicheng Luo (12 papers)
  2. Jackie Kay (19 papers)
  3. Edward Grefenstette (66 papers)
  4. Marc Peter Deisenroth (73 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.