Adversarial Machine Learning: Bayesian Perspectives (2003.03546v2)
Abstract: Adversarial Machine Learning (AML) is emerging as a major field aimed at protecting ML systems against security threats: in certain scenarios there may be adversaries that actively manipulate input data to fool learning systems. This creates a new class of security vulnerabilities that ML systems may face, and a new desirable property called adversarial robustness essential to trust operations based on ML outputs. Most work in AML is built upon a game-theoretic modelling of the conflict between a learning system and an adversary, ready to manipulate input data. This assumes that each agent knows their opponent's interests and uncertainty judgments, facilitating inferences based on Nash equilibria. However, such common knowledge assumption is not realistic in the security scenarios typical of AML. After reviewing such game-theoretic approaches, we discuss the benefits that Bayesian perspectives provide when defending ML-based systems. We demonstrate how the Bayesian approach allows us to explicitly model our uncertainty about the opponent's beliefs and interests, relaxing unrealistic assumptions, and providing more robust inferences. We illustrate this approach in supervised learning settings, and identify relevant future research problems.
- Alfeld, S., Zhu, X., and Barford, P. (2016), “Data poisoning attacks against autoregressive models,” in Proc. 30th AAAI Conference Artificial Intelligence, pp. 1452–1458.
- Athalye, A., Carlini, N., and Wagner, D. (2018), “Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples,” in Int. Conf. on Mach. Learn., pp. 274–283.
- Banks, D., Gallego, V., Naveiro, R., and Ríos Insua, D. (2020), “Adversarial risk analysis: An overview,” Wiley Interdisciplinary Reviews: Computational Statistics, e1530.
- Biggio, B., Fumera, G., and Roli, F. (2014), “Security evaluation of pattern classifiers under attack,” IEEE Transactions on Knowledge and Data Engineering, 26, 984–996.
- Biggio, B., Pillai, I., Rota Bulò, S., Ariu, D., Pelillo, M., and Roli, F. (2013), “Is data clustering in adversarial settings secure?” in Proc. 2013 ACM w. on Art. Intell. Sec., ACM, pp. 87–98.
- Biggio, B. and Roli, F. (2018), “Wild patterns: Ten years after the rise of adversarial machine learning,” Pattern Recognition, 84, 317 – 331.
- Brown, T. B., Mané, D., Roy, A., Abadi, M., and Gilmer, J. (2017), “Adversarial patch,” arXiv:1712.09665.
- Brückner, M. and Scheffer, T. (2011), “Stackelberg games for adversarial prediction problems,” in Proc. 17th ACM SIGKDD Int. Conf., pp. 547–555.
- Buşoniu, L., Babuška, R., and De Schutter, B. (2010), “Multi-agent reinforcement learning: An overview,” in Innovations in multi-agent systems and applications - 1, Berlin, Heidelberg: Springer-Verlag, pp. 183–221.
- Caballero, W. N., Ríos Insua, D., and Banks, D. (2021), “Decision support issues in automated driving systems,” International Transactions in Operational Research, 1–29.
- Carlini, N. and Wagner, D. (2017), “Towards Evaluating the Robustness of Neural Networks,” in 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57.
- Ciancaglini, C. G., Sancho, D., McCarthy, O., Eira, M., Amann, P., Klayn, A., McArdle, R., Beridze, I., and Amann, P. (2020), “Malicious uses and abuses of Artificial Intelligence,” Trend Micro Research, available at https://documents.trendmicro.com/assets/white_papers/wp-malicious-uses-and-abuses-of-artificial-intelligence.pdf.
- Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., and Ha, D. (2018), “Deep learning for classical Japanese literature,” arXiv:1812.01718.
- Comiter, M. (2019), “Attacking Artificial Intelligence,” Available at https://www.belfercenter.org/sites/default/files/2019-08/AttackingAI/AttackingAI.pdf.
- Couce-Vieira, A., Insua, D. R., and Kosgodagan, A. (2020), “Assessing and forecasting cybersecurity impacts,” Decision Analysis, 17, 356–374.
- Dalvi, N., Domingos, P., Mausam, Sumit, S., and Verma, D. (2004), “Adversarial classification,” in Proc. 10th ACM SIGKDD Int. Conf., KDD ’04, pp. 99–108.
- Dasgupta, P. and Collins, J. B. (2019), “A Survey of Game Theoretic Approaches for Adversarial Machine Learning in Cybersecurity Tasks,” AI Magazine, 40.
- Du, J., Futoma, J., and Doshi-Velez, F. (2020), “Model-based reinforcement learning for semi-Markov decision processes with neural ODEs,” arXiv:2006.16210.
- Dua, D. and Graff, C. (2017), “UCI Machine Learning Repository,” Available at http://archive.ics.uci.edu/ml.
- Ekin, T., Naveiro, R., Insua, D. R., and Torres-Barrán, A. (2022), “Augmented probability simulation methods for sequential games,” European Journal of Operational Research.
- Fan, J., Ma, C., and Zhong, Y. (2021), “A selective overview of deep learning,” Statistical Science, 36, 264–290.
- Gallego, V. and Insua, D. R. (2018), “Stochastic Gradient MCMC with Repulsive Forces,” in Bayesian Deep Learning Workshop, Neural Information and Processing Systems, available at http://bayesiandeeplearning.org/2018/papers/154.pdf.
- Gallego, V., Naveiro, R., Redondo, A., Insua, D. R., and Ruggeri, F. (2020), “Protecting Classifiers From Attacks. A Bayesian Approach,” arXiv:2004.08705.
- Goodfellow, I., Shlens, J., and Szegedy, C. (2015), “Explaining and Harnessing Adversarial Examples,” in International Conference on Learning Representations, available at https://arxiv.org/abs/1412.6572.
- Gowal, S., Dvijotham, K., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Arandjelovic, R., Mann, T., and Kohli, P. (2018), “On the effectiveness of interval bound propagation for training verifiably robust models,” arXiv:1810.12715.
- Johansson, F., Shalit, U., and Sontag, D. (2016), “Learning representations for counterfactual inference,” in International Conference on Machine Learning, pp. 3020–3029, available at http://proceedings.mlr.press/v48/johansson16.pdf.
- Joshi, C., Aliaga, J. R., and Insua, D. R. (2021), “Insider Threat Modeling: An Adversarial Risk Analysis Approach,” IEEE Trans. Inf. For. Sec., 16, 1131–1142.
- Kadane, J. B. and Larkey, P. D. (1982), “Subjective probability and the theory of games,” Management Science, 28, 113–120.
- Kannan, H., Kurakin, A., and Goodfellow, I. (2018), “Adversarial logit pairing,” arXiv:1803.06373.
- Kos, J., Fischer, I., and Song, D. (2018), “Adversarial examples for generative models,” in 2018 IEEE Security and Privacy Workshops (SPW), IEEE, pp. 36–42.
- LeCun, Y., Cortes, C., and Burges, C. (1998), “The MNIST database of handwritten digits,” Available at http://yann.lecun.com/exdb/mnist/.
- Lee, D., Moon, S., Lee, J., and Song, H. O. (2022), “Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization,” in International Conference on Machine Learning, PMLR, pp. 12478–12497.
- Lin, Y.-C., Hong, Z.-W., Liao, Y.-H., Shih, M.-L., Liu, M.-Y., and Sun, M. (2017), “Tactics of Adversarial Attack on Deep Reinforcement Learning Agents,” in Proc. 26th Int. Joint Conf. on Art. Int., AAAI Press, IJCAI’17, p. 3756–3762.
- Liu, X., Li, Y., Wu, C., and Hsieh, C.-J. (2018), “ADV-BNN: Improved adversarial defense through robust Bayesian neural network,” arXiv:1810.01279.
- Lowd, D. and Meek, C. (2005), “Adversarial learning,” in Proc. 11th ACM Int. Conf. Knowledge Discovery in Data Mining, KDD ’05, pp. 641–647.
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2018), “Towards Deep Learning Models Resistant to Adversarial Attacks,” in Int. Conf. Learn.Rep.
- Menache, I. and Ozdaglar, A. (2011), “Network games: Theory, models, and dynamics,” Synthesis Lectures on Communication Networks, 4, 1–159.
- Miller, J. W. and Dunson, D. B. (2019), “Robust Bayesian inference via coarsening,” Journal of the American Statistical Association, 114, 1113–1125.
- Naveiro, R. (2021), “Adversarial attacks against Bayesian forecasting dynamic models,” in 22nd European Young Statisticians Meeting, p. 66, available at https://arxiv.org/pdf/2110.10783.pdf.
- Naveiro, R. and Insua, D. R. (2019), “Gradient Methods for Solving Stackelberg Games,” in International Conference on Algorithmic DecisionTheory, Springer, pp. 126–140.
- Naveiro, R., Redondo, A., Ríos Insua, D., and Ruggeri, F. (2019a), “Adversarial classification: An adversarial risk analysis approach,” Int. Jour. Approx. Reas., 113, 133–148.
- Naveiro, R., Rodríguez, S., and Ríos Insua, D. (2019b), “Large-scale automated forecasting for network safety and security monitoring,” Appl. Stoch. Models in Bus. and Ind., 35, 431–447.
- Papernot, N., Faghri, F., Carlini, N., Goodfellow, I., Feinman, R., Kurakin, A., Xie, C., Sharma, Y., Brown, T., Roy, A., Matyasko, A., Behzadan, V., Hambardzumyan, K., Zhang, Z., Juang, Y.-L., Li, Z., Sheatsley, R., Garg, A., Uesato, J., Gierke, W., Dong, Y., Berthelot, D., Hendricks, P., Rauber, J., and Long, R. (2018), “Technical Report on the CleverHans v2.1.0 Adversarial Examples Library,” arXiv:1610.00768.
- Papernot, N., McDaniel, P., Swami, A., and Harang, R. (2016), “Crafting adversarial input sequences for recurrent neural networks,” in 2016 IEEE Mil. Com. Conf., IEEE, pp. 49–54.
- Park, T. and Casella, G. (2008), “The Bayesian lasso,” Journal of the American Statistical Association, 103, 681–686.
- Rios, J. and Insua, D. R. (2012), “Adversarial risk analysis for counterterrorism modeling,” Risk Analysis: An International Journal, 32, 894–915.
- Rios Insua, D., Banks, D., and Rios, J. (2016), “Modeling Opponents in Adversarial Risk Analysis,” Risk Analysis, 36, 742–755.
- Rios Insua, D., Couce-Vieira, A., Rubio, J. A., Pieters, W., Labunets, K., and G. Rasines, D. (2021), “An Adversarial Risk Analysis Framework for Cybersecurity,” Risk Analysis, 14, 16–36.
- Rios Insua, D., Naveiro, R., and Gallego, V. (2020), “Perspectives on Adversarial Classification,” Mathematics, 8.
- Rios Insua, D., Rios, J., and Banks, D. (2009), “Adversarial risk analysis,” Journal of the American Statistical Association, 104, 841–854.
- Rios Insua, D. and Ruggeri, F. (2000), “Robust Bayesian Analysis,” Lecture Notes in Statistics, 152.
- Ríos Insua, D., González-Ortega, J., Banks, D., and Ríos, J. (2018), “Concept uncertainty in adversarial statistical decision theory,” in The Mathematics of the Uncertain, Cham, Switzerland: Springer Nature.
- Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y. (2021), “Toward causal representation learning,” Proceedings of the IEEE, 109, 612–634.
- Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. (2017), “Mastering the game of Go without human knowledge,” Nature, 550, 354.
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2014), “Intriguing properties of neural networks,” in International Conference on Learning Representations.
- Tuyls, K. and Weiss, G. (2012), “Multiagent learning: Basics, challenges, and prospects,” AI Magazine, 33, 41–41.
- Vorobeychik, Y. and Kantarcioglu, M. (2018), “Adversarial machine learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning, 12, 1–169.
- Wang, C., Bunel, R., Dvijotham, K., Huang, P.-S., Grefenstette, E., and Kohli, P. (2019), “Knowing When to Stop: Evaluation and Verification of Conformity to Output-Size Specifications,” in Proc. IEEE Conf. Comp. Vision Pattern Recog., pp. 12260–12269.
- Welling, M. and Teh, Y. W. (2011), “Bayesian learning via stochastic gradient Langevin dynamics,” in Proc. 28th Int. Conf. on Machine Learning, pp. 681–688.
- Xiao, H., Rasul, K., and Vollgraf, R. (2017), “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms,” arXiv:1708.07747.
- Ye, N. and Zhu, Z. (2018), “Bayesian adversarial learning,” in Proc. 32nd Int. Conf. Neur. Inf. Proc. Sys., Red Hook, NY: Curran Associates Inc., pp. 6892–6901.
- Zeager, M. F., Sridhar, A., Fogal, N., Adams, S., Brown, D. E., and Beling, P. A. (2017), “Adversarial learning in credit card fraud detection,” in Sys. Inf. Eng. Des. Symp. (SIEDS), 2017, IEEE, pp. 112–116.
- Zhang, C., Zhang, K., and Li, Y. (2020a), “A Causal View on Robustness of Neural Networks,” in Advances in Neural Information Processing Systems, vol. 33, pp. 289–301.
- Zhang, W. E., Sheng, Q. Z., Alhazmi, A., and Li, C. (2020b), “Adversarial attacks on deep-learning models in natural language processing: A survey,” ACM Transactions on Intelligent Systems and Technology (TIST), 11, 1–41.
- Zhang, Y., Gong, M., Liu, T., Niu, G., Tian, X., Han, B., Schölkopf, B., and Zhang, K. (2021), “Adversarial Robustness through the Lens of Causality,” arXiv:2106.06196.