Conservative State Value Estimation for Offline Reinforcement Learning (2302.06884v2)

Published 14 Feb 2023 in cs.LG and cs.AI

Abstract: Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to incorporate a penalty term to reward or value estimation in the BeLLMan iterations. Meanwhile, to avoid extrapolation on out-of-distribution (OOD) states and actions, existing methods focus on conservative Q-function estimation. In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states. Compared to prior work, CSVE allows more effective state value estimation with conservative guarantees and further better policy optimization. Further, we apply CSVE and develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states \emph{around} the dataset, and the actor applies advantage weighted updates extended with state exploration to improve the policy. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.

References (35)

Authors (8)

Liting Chen (6 papers)
Jie Yan (25 papers)
Zhengdao Shao (1 paper)
Lu Wang (329 papers)
Qingwei Lin (81 papers)
Saravan Rajmohan (85 papers)
Thomas Moscibroda (8 papers)
Dongmei Zhang (193 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

YouTube

Show All Videos

Conservative State Value Estimation for Offline Reinforcement Learning (2302.06884v2)

Summary

Related Papers

YouTube