SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance (2410.18626v2)

Published 24 Oct 2024 in cs.LG and cs.AI

Abstract: Offline-to-online (O2O) reinforcement learning (RL) pre-trains models on offline data and refines policies through online fine-tuning. However, existing O2O RL algorithms typically require maintaining the tedious offline datasets to mitigate the effects of out-of-distribution (OOD) data, which significantly limits their efficiency in exploiting online samples. To address this deficiency, we introduce a new paradigm for O2O RL called State-Action-Conditional Offline \Model Guidance (SAMG). It freezes the pre-trained offline critic to provide compact offline understanding for each state-action sample, thus eliminating the need for retraining on offline data. The frozen offline critic is incorporated with the online target critic weighted by a state-action-adaptive coefficient. This coefficient aims to capture the offline degree of samples at the state-action level, and is updated adaptively during training. In practice, SAMG could be easily integrated with Q-function-based algorithms. Theoretical analysis shows good optimality and lower estimation error. Empirically, SAMG outperforms state-of-the-art O2O RL algorithms on the D4RL benchmark.

References (53)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance (2410.18626v2)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (6)

Tweets

Don't miss out on important new AI/ML research

SAMG: Offline-to-Online Reinforcement Learning via State-Action-Conditional Offline Model Guidance (2410.18626v2)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (6)

Tweets

Don't miss out on important new AI/ML research