Corruption-Robust Offline Two-Player Zero-Sum Markov Games (2403.07933v1)
Abstract: We study data corruption robustness in offline two-player zero-sum Markov games. Given a dataset of realized trajectories of two players, an adversary is allowed to modify an $\epsilon$-fraction of it. The learner's goal is to identify an approximate Nash Equilibrium policy pair from the corrupted data. We consider this problem in linear Markov games under different degrees of data coverage and corruption. We start by providing an information-theoretic lower bound on the suboptimality gap of any learner. Next, we propose robust versions of the Pessimistic Minimax Value Iteration algorithm, both under coverage on the corrupted data and under coverage only on the clean data, and show that they achieve (near)-optimal suboptimality gap bounds with respect to $\epsilon$. We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.
- An Optimistic Perspective on Offline Reinforcement Learning. In ICML, 2020.
- Robust Linear regression: Optimal Rates in Polynomial time. In ACM SIGACT, 2021.
- Defense Against Reward Poisoning Attacks in Reinforcement Learning. Trans. Mach. Learn. Res., 2023.
- Christopher Berner et al. Dota 2 with Large Scale Deep Reinforcement Learning. CoRR, abs/1912.06680, 2019.
- Poisoning Attacks Against Support Vector Machines. CoRR, abs/1206.6389, 2012.
- Battista Biggio et al. Evasion Attacks Against Machine Learning at Test Time. In ECML PKDD, 2013.
- Provably Efficient Exploration in Policy Optimization. In ICML, 2020.
- Online and Distribution-free Robustness: Regression and Contextual Bandits with Huber Contamination. In FOCS, 2022a.
- Byzantine-robust Online and Offline Distributed Reinforcement Learning. CoRR, abs/2206.00165, 2022b.
- Certified Adversarial Robustness via Randomized Smoothing. In ICML, 2019.
- When is Offline Two-player Zero-sum Markov Game Solvable? CoRR, abs/2201.03522, 2022.
- Being Robust (in High Dimensions) Can be Practical. In ICML, 2017.
- Sever: A Robust Meta-algorithm for Stochastic Optimization. In ICML, 2019.
- Reinforcement Learning with a Corrupted Reward Channel. CoRR, abs/1705.08417, 2017.
- Fault-tolerant Federated Reinforcement Learning with Theoretical Guarantee. In NeurIPS, 2021.
- Off-policy Deep Reinforcement Learning without Exploration. In ICML, 2019.
- Adversarial Policies: Attacking Deep Reinforcement Learning. In ICLR, 2020.
- Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions. In NeurIPS, 2022.
- On Deriving the Inverse of a Sum of Matrices. Siam Review, 1981.
- Adversarial Attacks on Neural Network Policies. CoRR, abs/1702.02284, 2017.
- Deceptive Reinforcement Learning under Adversarial Manipulations on Cost Signals. In GameSec, 2019.
- Natasha Jaques et al. Way Off-policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog. CoRR, abs/1907.00456, 2019.
- Is Pessimism Provably Efficient for Offline RL? In ICML, 2021.
- Morel: Model-based Offline Rinforcement Learning. In NeurIPS, 2020.
- Conservative Q-learning for Offline Reinforcement Learning. In NeurIPS, 2020.
- Safe Policy Improvement with Baseline Bootstrapping. In ICML, 2019.
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. CoRR, abs/2005.01643, 2020.
- Data Poisoning Attacks on Factorization-based Collaborative Filtering. In NeurIPS, 2016.
- Tactics of Adversarial Attack on Deep Reinforcement Learning Agents. In IJCAI, 2017.
- Michael L. Littman. Markov Games as a Framework for Multi-agent Reinforcement Learning. In ICML, 1994.
- Michael L Littman. Value-function Reinforcement Learning in Markov Games. Cognitive Systems Research, 2:55–66, 2001.
- Policy Poisoning in Batch Reinforcement Learning and Control. In NeurIPS, 2019.
- Game Redesign in No-regret Game Playing. In IJCAI, 2022.
- Using Machine Teaching to Identify Optimal Training-set Attacks on Machine Learners. In AAAI, 2015.
- Implicit poisoning attacks in two-agent reinforcement learning: Adversarial policies for training-time attacks. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 1835–1844, 2023.
- Finite-time Bounds for Fitted Value Iteration. JMLR, 2008.
- Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In CVPR, 2015.
- Yunpeng Pan et al. Agile Autonomous Driving Using End-to-end Deep Imitation Learning. CoRR, abs/1709.07174, 2017.
- Practical Black-box Attacks Against Machine Learning. In ACM, 2017.
- Robust Regression with Covariate Filtering: Heavy Tails and Adversarial Contamination. CoRR, abs/2009.12976, 2020.
- Random Features for Large-scale Kernel Machines. In NeurIPS, 2007.
- Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning. In ICML, 2020.
- Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks. JMLR, 2021.
- Understanding the Limits of Poisoning Attacks in Episodic Reinforcement Learning. CoRR, abs/2208.13663, 2022.
- Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism. In NeurIPS, 2021.
- Decentralized Q-learning in Zero-sum Markov Games. In NeurIPS, 2021.
- Lloyd S Shapley. Stochastic Games. PNAS, 39, 1953.
- David Silver et al. Mastering the Game of Go Without Human Knowledge. Nature, 550, 2017.
- Stealthy and Efficient Adversarial Attacks Against Deep Reinforcement Learning. In AAAI, 2020a.
- Vulnerability-aware Poisoning Mechanism for Online RL with Unknown Dynamics. In ICLR, 2020b.
- Intriguing Properties of Neural Networks. CoRR, abs/1312.6199, 2013.
- Online Learning in Unknown Markov Games. In ICML, 2021.
- Pessimistic Model-based Offline Reinforcement Learning Under Partial Coverage. CoRR, abs/2107.06226, 2021.
- Decentralized Learning in Markov Games. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38, 2008.
- Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation. In ACM SIGKDD, 2018.
- Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games. In NeurIPS, 2002.
- CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing. CoRR, abs/2106.09292, 2021.
- Reward Poisoning Attacks on Offline Multi-agent Reinforcement Learning. CoRR, abs/2206.01888, 2022.
- Reward Poisoning Attacks on Offline Multi-agent Reinforcement Learning. In AAAI, 2023.
- Adversarial Label Flips Attack on Support Vector Machines. In ECAI, 2012.
- Is Feature Selection Secure Against Training Data Poisoning? In ICML, 2015.
- Learning Zero-sum Simultaneous-move Markov Games Using Function Approximation and Correlated Equilibrium. In COLT, 2020.
- Bellman-consistent Pessimism for Offline Reinforcement Learning. In NeurIPS, 2021.
- RORL: Robust Offline Reinforcement Learning via Conservative Smoothing. In NeurIPS, 2022.
- Corruption-robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes. In ICML, 2023.
- Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation. In COLT, 2021a.
- Provable Benefits of Actor-critic Methods for Offline Reinforcement Learning. In NeurIPS, 2021b.
- Offline Reinforcement Learning with Realizability and Single-policy Concentrability. In COLT, 2022.
- Corruption-robust Offline Reinforcement Learning. In AISTATS, 2022.
- Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets. In ICML, 2022.