The Adaptive Arms Race: Redefining Robustness in AI Security (2312.13435v3)
Abstract: Despite considerable efforts on making them robust, real-world AI-based systems remain vulnerable to decision based attacks, as definitive proofs of their operational robustness have so far proven intractable. Canonical robustness evaluation relies on adaptive attacks, which leverage complete knowledge of the defense and are tailored to bypass it. This work broadens the notion of adaptivity, which we employ to enhance both attacks and defenses, showing how they can benefit from mutual learning through interaction. We introduce a framework for adaptively optimizing black-box attacks and defenses under the competitive game they form. To assess robustness reliably, it is essential to evaluate against realistic and worst-case attacks. We thus enhance attacks and their evasive arsenal together using RL, apply the same principle to defenses, and evaluate them first independently and then jointly under a multi-agent perspective. We find that active defenses, those that dynamically control system responses, are an essential complement to model hardening against decision-based attacks; that these defenses can be circumvented by adaptive attacks, something that elicits defenses being adaptive too. Our findings, supported by an extensive theoretical and empirical investigation, confirm that adaptive adversaries pose a serious threat to black-box AI-based systems, rekindling the proverbial arms race. Notably, our approach outperforms the state-of-the-art black-box attacks and defenses, while bringing them together to render effective insights into the robustness of real-world deployed ML-based systems.
- Invariant risk minimization games. In International Conference on Machine Learning, pages 145–155. PMLR, 2020.
- S. V. Albrecht and P. Stone. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258:66–95, 2018.
- Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
- Learning to evade static pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917, 2018.
- K. J. Åström and B. Wittenmark. Adaptive control. Courier Corporation, 2013.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283, 2018.
- Transcending transcend: Revisiting malware classification in the presence of concept drift. In 2022 IEEE Symposium on Security and Privacy (SP), pages 805–823. IEEE, 2022.
- Adversarial example games. Advances in neural information processing systems, 33:8921–8934, 2020.
- Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- M. Brückner and T. Scheffer. Stackelberg games for adversarial prediction problems. In Proceedings of the 17th ACM SIGKDD conference on Knowledge discovery and data mining, pages 547–555, 2011.
- Guessing smart: Biased sampling for efficient black-box adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4958–4966, 2019.
- On the effectiveness of small input noise for defending against query-based black-box attacks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3051–3060, 2022.
- N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017.
- Hopskipjumpattack: A query-efficient decision-based attack. In 2020 ieee symposium on security and privacy (sp), pages 1277–1294. IEEE, 2020.
- Stateful detection of black-box adversarial attacks. In Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence, pages 30–39, 2020.
- Sign-opt: A query-efficient hard-label adversarial attack. In International Conference on Learning Representations, 2019.
- Learning with rejection. In International Conference on Algorithmic Learning Theory, pages 67–82. Springer, 2016.
- F. Croce and M. Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, 2020.
- Functionality-preserving black-box optimization of adversarial windows malware. IEEE Transactions on Information Forensics and Security, 16:3469–3478, 2021.
- Stateful defenses for machine learning models are not yet secure against black-box attacks. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 786–800, 2023.
- Achieving optimal adversarial accuracy for adversarial deep learning using stackelberg games. Acta Mathematica Scientia, 42(6):2399–2418, 2022.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- Adversarial policies: Attacking deep reinforcement learning. In Proc. ICLR-20, 2020.
- Solving for best responses and equilibria in extensive-form games with reinforcement learning methods. In Rohit Parikh on Logic, Language and Society, pages 185–226. Springer, 2017.
- Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science, pages 111–122, 2016.
- Towards security threats of deep learning systems: A survey. IEEE Transactions on Software Engineering, 48(5):1743–1770, 2020.
- A survey of learning in multiagent environments: Dealing with non-stationarity. arXiv preprint arXiv:1707.09183, 2017.
- D. R. Hofstadter. Metamagical themas: Questing for the essence of mind and pattern. Hachette UK, 2008.
- Prada: protecting against dnn model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pages 512–527. IEEE, 2019.
- D. M. Levin. Induction and husserl’s theory of eidetic variation. Philosophy and Phenomenological Research, 29(1):1–15, 1968.
- Blacklight: Scalable defense for neural networks against {{\{{Query-Based}}\}}{{\{{Black-Box}}\}} attacks. In 31st USENIX Security Symposium (USENIX Security 22), pages 2117–2134, 2022.
- Qeba: Query-efficient boundary-based blackbox attack. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
- M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings. Elsevier, 1994.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Surfree: a fast surrogate-free black-box attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10430–10439, 2021.
- A. Pal and R. Vidal. A game theoretic analysis of additive adversarial attacks and defenses. Advances in Neural Information Processing Systems, 33:1345–1355, 2020.
- H. Pham and Q. Le. Autodropout: Learning dropout patterns to regularize deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9351–9359, 2021.
- Intriguing properties of adversarial ml attacks in the problem space. In 2020 IEEE symposium on security and privacy (SP), pages 1332–1349. IEEE, 2020.
- Random noise defense against query-based black-box attacks. Advances in Neural Information Processing Systems, 34:7650–7663, 2021.
- Stable baselines3, 2019.
- A unified game-theoretic interpretation of adversarial robustness. arXiv preprint arXiv:2103.07364, 2021.
- Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
- S. Sengupta and S. Kambhampati. Multi-agent reinforcement learning in bayesian stackelberg markov games for adaptive moving target defense. arXiv e-prints, pages arXiv–2007, 2020.
- On the suitability of lp-norms for creating and preventing adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1605–1613, 2018.
- Preprocessors matter! realistic decision-based attacks on machine learning systems. arXiv preprint arXiv:2210.03297, 2022.
- R. Sutton. The bitter lesson. Incomplete Ideas (blog), 13:12, 2019.
- Policy gradient methods for reinforcement learning with function approximation. In NIPs, volume 99, pages 1057–1063. Citeseer, 1999.
- Approximate exploitability: Learning a best response. IJCAI, Jul, 2022.
- On adaptive attacks to adversarial example defenses. Advances in Neural Information Processing Systems, 33:1633–1645, 2020.
- Adaptive malware control: Decision-based attacks in the problem space of dynamic analysis. In Proceedings of the 1st Workshop on Robust Malware Analysis, pages 3–14, 2022.
- K. Tuyls and G. Weiss. Multiagent learning: Basics, challenges, and prospects. Ai Magazine, 33(3):41–41, 2012.
- On the convergence and robustness of adversarial training. In International Conference on Machine Learning, pages 6586–6595. PMLR, 2019.
- Probabilistic recursive reasoning for multi-agent reinforcement learning. In 7th International Conference on Learning Representations, ICLR 2019, 2019.
- Policy-driven attack: learning to query for hard-label black-box adversarial examples. In International Conference on Learning Representations, 2020.
- S. Yasodharan and P. Loiseau. Nonzero-sum adversarial hypothesis testing games. Advances in Neural Information Processing Systems, 32, 2019.
- B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.