Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning (2305.10282v1)

Published 17 May 2023 in cs.LG, cs.IT, math.IT, math.ST, stat.ML, and stat.TH

Abstract: This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and model-based offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL -- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Gen Li (143 papers)
  2. Wenhao Zhan (17 papers)
  3. Jason D. Lee (151 papers)
  4. Yuejie Chi (109 papers)
  5. Yuxin Chen (195 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.