Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Corruption-Robust Offline Reinforcement Learning with General Function Approximation (2310.14550v3)

Published 23 Oct 2023 in cs.LG

Abstract: We investigate the problem of corruption robustness in offline reinforcement learning (RL) with general function approximation, where an adversary can corrupt each sample in the offline dataset, and the corruption level $\zeta\geq0$ quantifies the cumulative corruption amount over $n$ episodes and $H$ steps. Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs). Drawing inspiration from the uncertainty-weighting technique from the robust online RL setting \citep{he2022nearly,ye2022corruptionrobust}, we design a new uncertainty weight iteration procedure to efficiently compute on batched samples and propose a corruption-robust algorithm for offline RL. Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n){-1})$ due to the corruption. Here $\widehat{\mathcal{F}}$ is the confidence set, and the dataset $\mathcal{Z}_nH$, and $C(\widehat{\mathcal{F}},\mu)$ is a coefficient that depends on $\widehat{\mathcal{F}}$ and the underlying data distribution $\mu$. When specialized to linear MDPs, the corruption-dependent error term reduces to $\mathcal{O}(\zeta d n{-1})$ with $d$ being the dimension of the feature map, which matches the existing lower bound for corrupted linear MDPs. This suggests that our analysis is tight in terms of the corruption-dependent term.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chenlu Ye (14 papers)
  2. Rui Yang (221 papers)
  3. Quanquan Gu (198 papers)
  4. Tong Zhang (569 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.