Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning (2305.15669v1)

Published 25 May 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance. However, existing methods, effective as they are, suffer from suboptimal performance, limited adaptability, and unsatisfactory computational efficiency. We propose a novel framework, PROTO, which overcomes the aforementioned limitations by augmenting the standard RL objective with an iteratively evolving regularization term. Performing a trust-region-style update, PROTO yields stable initial finetuning and optimal final performance by gradually evolving the regularization term to relax the constraint strength. By adjusting only a few lines of code, PROTO can bridge any offline policy pretraining and standard off-policy RL finetuning to form a powerful offline-to-online RL pathway, birthing great adaptability to diverse methods. Simple yet elegant, PROTO imposes minimal additional computation and enables highly efficient online finetuning. Extensive experiments demonstrate that PROTO achieves superior performance over SOTA baselines, offering an adaptable and efficient offline-to-online RL framework.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jianxiong Li (31 papers)
  2. Xiao Hu (151 papers)
  3. Haoran Xu (77 papers)
  4. Jingjing Liu (139 papers)
  5. Xianyuan Zhan (47 papers)
  6. Ya-Qin Zhang (45 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.