MOORe: Model-based Offline-to-Online Reinforcement Learning (2201.10070v1)

Published 25 Jan 2022 in cs.LG

Abstract: With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online. A smooth transfer of the policy matters in safe real-world deployment. Besides, fast adaptation of the policy plays a vital role in practical online performance improvement. To tackle these challenges, we propose a simple yet efficient algorithm, Model-based Offline-to-Online Reinforcement learning (MOORe), which employs a prioritized sampling scheme that can dynamically adjust the offline and online data for smooth and efficient online adaptation of the policy. We provide a theoretical foundation for our algorithms design. Experiment results on the D4RL benchmark show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption, and also significantly outperforms existing methods.

Authors (4)

Yihuan Mao (6 papers)
Chao Wang (555 papers)
Bin Wang (751 papers)
Chongjie Zhang (68 papers)

Citations (13)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

MOORe: Model-based Offline-to-Online Reinforcement Learning (2201.10070v1)

Summary

Related Papers