Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning (2402.09695v2)
Abstract: We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm, and its budget is limited by constraining the amount of corruption at each data point and the total perturbation. We require the attack to be universally efficient against any efficient algorithms that might be used by the agent. We propose an attack strategy called the `policy contrast attack.' The idea is to find low- and high-performing policies covered by the dataset and make them appear to be high- and low-performing to the agent, respectively. To the best of our knowledge, we propose the first universal black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different learning datasets.
- An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pp. 104–114. PMLR, 2020.
- Adversarially trained actor critic for offline reinforcement learning. In International Conference on Machine Learning, pp. 3852–3878. PMLR, 2022.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Policy poisoning in batch reinforcement learning and control. Advances in Neural Information Processing Systems, 32, 2019.
- Policy teaching via environment poisoning: Training-time adversarial attacks against reinforcement learning. In International Conference on Machine Learning, pp. 7974–7984. PMLR, 2020.
- Vulnerability-aware poisoning mechanism for online rl with unknown dynamics. arXiv preprint arXiv:2009.00774, 2020.
- Reinforcement learning: An introduction. MIT press, 2018.
- Corl: Research-oriented deep offline reinforcement learning library. arXiv preprint arXiv:2210.07105, 2022.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
- Black-box targeted reward poisoning attack against online deep reinforcement learning. arXiv preprint arXiv:2305.10681, 2023.
- Efficient reward poisoning attacks on online deep reinforcement learning. arXiv preprint arXiv:2205.14842, 2022.
- Corruption-robust offline reinforcement learning with general function approximation. arXiv preprint arXiv:2310.14550, 2023.
- Adaptive reward-poisoning attacks against reinforcement learning. In International Conference on Machine Learning, pp. 11225–11234. PMLR, 2020.
- Corruption-robust offline reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp. 5757–5773. PMLR, 2022.
- Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pp. 167–176, 2018.