Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CROP: Conservative Reward for Model-based Offline Policy Optimization (2310.17245v1)

Published 26 Oct 2023 in cs.LG and cs.AI

Abstract: Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges due to their capability to mitigate the limitations of offline data through data generation using models. Prior research has demonstrated that introducing conservatism into the model or Q-function during policy optimization can effectively alleviate the prevalent distribution drift problem in offline RL. However, the investigation into the impacts of conservatism in reward estimation is still lacking. This paper proposes a novel model-based offline RL algorithm, Conservative Reward for model-based Offline Policy optimization (CROP), which conservatively estimates the reward in model training. To achieve a conservative reward estimation, CROP simultaneously minimizes the estimation error and the reward of random actions. Theoretical analysis shows that this conservative reward mechanism leads to a conservative policy evaluation and helps mitigate distribution drift. Experiments on D4RL benchmarks showcase that the performance of CROP is comparable to the state-of-the-art baselines. Notably, CROP establishes an innovative connection between offline and online RL, highlighting that offline RL problems can be tackled by adopting online RL techniques to the empirical Markov decision process trained with a conservative reward. The source code is available with https://github.com/G0K0URURI/CROP.git.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Hao Li (803 papers)
  2. Xiao-Hu Zhou (18 papers)
  3. Xiao-Liang Xie (13 papers)
  4. Shi-Qi Liu (9 papers)
  5. Zhen-Qiu Feng (5 papers)
  6. Xiao-Yin Liu (5 papers)
  7. Mei-Jiang Gui (10 papers)
  8. Tian-Yu Xiang (9 papers)
  9. De-Xing Huang (7 papers)
  10. Bo-Xian Yao (2 papers)
  11. Zeng-Guang Hou (25 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.