A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs (2402.04493v2)

Published 7 Feb 2024 in stat.ML and cs.LG

Abstract: We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $O(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

References (35)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1755457516194980102

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs (2402.04493v2)

Summary

Related Papers

Tweets