Offline Reinforcement Learning via Linear-Programming with Error-Bound Induced Constraints (2212.13861v3)

Published 28 Dec 2022 in cs.LG, math.OC, and stat.ML

Abstract: Offline reinforcement learning (RL) aims to find an optimal policy for Markov decision processes (MDPs) using a pre-collected dataset. In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL, with the goal of developing algorithms with optimal $O(1/\sqrt{n})$ sample complexity, where $n$ is the sample size, under partial data coverage and general function approximation, and with favorable computational tractability. To this end, we derive new \emph{error bounds} for both the dual and primal-dual formulations of the LP, and incorporate them properly as \emph{constraints} in the LP reformulation. We then show that under a completeness-type assumption, $O(1/\sqrt{n})$ sample complexity can be achieved under standard single-policy coverage assumption, when one properly \emph{relaxes} the occupancy validity constraint in the LP. This framework can readily handle both infinite-horizon discounted and average-reward MDPs, in both general function approximation and tabular cases. The instantiation to the tabular case achieves either state-of-the-art or the first sample complexities of offline RL in these settings. To further remove any completeness-type assumption, we then introduce a proper \emph{lower-bound constraint} in the LP, and a variant of the standard single-policy coverage assumption. Such an algorithm leads to a $O(1/\sqrt{n})$ sample complexity with dependence on the \emph{value-function gap}, with only realizability assumptions. Our properly constrained LP framework advances the existing results in several aspects, in relaxing certain assumptions and achieving the optimal $O(1/\sqrt{n})$ sample complexity, with simple analyses. We hope our results bring new insights into the use of LP formulations and the equivalent primal-dual minimax optimization for offline RL, through the error-bound induced constraints.

Authors (4)

Asuman Ozdaglar (102 papers)
Sarath Pattathil (17 papers)
Jiawei Zhang (529 papers)
Kaiqing Zhang (70 papers)

Citations (14)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Offline Reinforcement Learning via Linear-Programming with Error-Bound Induced Constraints (2212.13861v3)

Summary

Related Papers

Tweets