Corruption-Robust Offline Reinforcement Learning with General Function Approximation (2310.14550v3)
Abstract: We investigate the problem of corruption robustness in offline reinforcement learning (RL) with general function approximation, where an adversary can corrupt each sample in the offline dataset, and the corruption level $\zeta\geq0$ quantifies the cumulative corruption amount over $n$ episodes and $H$ steps. Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs). Drawing inspiration from the uncertainty-weighting technique from the robust online RL setting \citep{he2022nearly,ye2022corruptionrobust}, we design a new uncertainty weight iteration procedure to efficiently compute on batched samples and propose a corruption-robust algorithm for offline RL. Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n){-1})$ due to the corruption. Here $\widehat{\mathcal{F}}$ is the confidence set, and the dataset $\mathcal{Z}_nH$, and $C(\widehat{\mathcal{F}},\mu)$ is a coefficient that depends on $\widehat{\mathcal{F}}$ and the underlying data distribution $\mu$. When specialized to linear MDPs, the corruption-dependent error term reduces to $\mathcal{O}(\zeta d n{-1})$ with $d$ being the dimension of the feature map, which matches the existing lower bound for corrupted linear MDPs. This suggests that our analysis is tight in terms of the corruption-dependent term.
- Chenlu Ye (14 papers)
- Rui Yang (221 papers)
- Quanquan Gu (198 papers)
- Tong Zhang (569 papers)