Identifying Adversarially Attackable and Robust Samples (2301.12896v3)
Abstract: Adversarial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. Despite extensive research on generating adversarial attacks and building defense systems, there has been limited research on understanding adversarial attacks from an input-data perspective. This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks (attackable samples) and conversely also identify the least susceptible samples (robust samples). We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model. Experiments on standard image classification datasets enables us to assess the portability of the deep attackability detector across a range of architectures. We find that the deep attackability detector performs better than simple model uncertainty-based measures for identifying the attackable/robust samples. This suggests that uncertainty is an inadequate proxy for measuring sample distance to a decision boundary. In addition to better understanding adversarial attack theory, it is found that the ability to identify the adversarially attackable and robust samples has implications for improving the efficiency of sample-selection tasks.
- Recent advances in adversarial training for adversarial robustness. In Zhou, Z.-H. (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 4312–4321. International Joint Conferences on Artificial Intelligence Organization, 8 2021. doi: 10.24963/ijcai.2021/591. URL https://doi.org/10.24963/ijcai.2021/591. Survey Track.
- Wild patterns: Ten years after the rise of adversarial machine learning. CoRR, abs/1712.03141, 2017. URL http://arxiv.org/abs/1712.03141.
- Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016. URL http://arxiv.org/abs/1608.04644.
- Adversarial attacks and defences: A survey. CoRR, abs/1810.00069, 2018. URL http://arxiv.org/abs/1810.00069.
- Adversarial active learning for deep networks: a margin based approach. CoRR, abs/1802.09841, 2018. URL http://arxiv.org/abs/1802.09841.
- A survey of uncertainty in deep neural networks. CoRR, abs/2107.03342, 2021. URL https://arxiv.org/abs/2107.03342.
- Resisting adversarial attacks using gaussian mixture variational autoencoders. CoRR, abs/1806.00081, 2018. URL http://arxiv.org/abs/1806.00081.
- Adversarial spheres. CoRR, abs/1801.02774, 2018. URL http://arxiv.org/abs/1801.02774.
- Explaining and harnessing adversarial examples, 2014. URL https://arxiv.org/abs/1412.6572.
- Towards deep neural network architectures robust to adversarial examples, 2014. URL https://arxiv.org/abs/1412.5068.
- He, Z. Deep learning in image classification: A survey report. In 2020 2nd International Conference on Information Technology and Computer Application (ITCA), pp. 174–177, 2020. doi: 10.1109/ITCA52113.2020.00043.
- Visible progress on adversarial images and a new saliency map. CoRR, abs/1608.00530, 2016. URL http://arxiv.org/abs/1608.00530.
- Learning sample reweighting for accuracy and adversarial robustness, 2022.
- Adversarial attacks on speech recognition systems for mission-critical applications: A survey, 2022. URL https://arxiv.org/abs/2202.10594.
- Enablers of adversarial attacks in machine learning. In MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM), pp. 425–430, 2018. doi: 10.1109/MILCOM.2018.8599715.
- Natural language processing: State of the art, current trends and challenges. CoRR, abs/1708.05148, 2017. URL http://arxiv.org/abs/1708.05148.
- Entropy weighted adversarial training. 2021.
- Krizhevsky, A. Learning multiple layers of features from tiny images. pp. 32–33, 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
- Adversarial machine learning at scale. CoRR, abs/1611.01236, 2016. URL http://arxiv.org/abs/1611.01236.
- Generative adversarial trainer: Defense to adversarial perturbations with GAN. CoRR, abs/1705.03387, 2017. URL http://arxiv.org/abs/1705.03387.
- Towards deep learning models resistant to adversarial attacks, 2017. URL https://arxiv.org/abs/1706.06083.
- Magnet: a two-pronged defense against adversarial examples. CoRR, abs/1705.09064, 2017. URL http://arxiv.org/abs/1705.09064.
- A survey on efficient methods for adversarial robustness. IEEE Access, 10:118815–118830, 2022. doi: 10.1109/ACCESS.2022.3216291.
- The limitations of deep learning in adversarial settings. CoRR, abs/1511.07528, 2015. URL http://arxiv.org/abs/1511.07528.
- A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies, 2022. URL https://arxiv.org/abs/2203.14046.
- Residue-based natural language adversarial attack detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.naacl-main.281. URL https://doi.org/10.18653%2Fv1%2F2022.naacl-main.281.
- A survey of deep active learning. CoRR, abs/2009.00236, 2020. URL https://arxiv.org/abs/2009.00236.
- Active sentence learning by adversarial uncertainty sampling in discrete space. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4908–4917, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.441. URL https://aclanthology.org/2020.findings-emnlp.441.
- Popular imperceptibility measures in visual adversarial attacks are far from human perception. In Zhu, Q., Baras, J. S., Poovendran, R., and Chen, J. (eds.), Decision and Game Theory for Security, pp. 188–199, Cham, 2020. Springer International Publishing. ISBN 978-3-030-64793-3.
- Adversarial examples on object recognition: A comprehensive survey. CoRR, abs/2008.04094, 2020. URL https://arxiv.org/abs/2008.04094.
- Defending against adversarial images using basis functions transformations, 2018. URL https://arxiv.org/abs/1803.10840.
- Defending against adversarial attacks by suppressing the largest eigenvalue of fisher information matrix. CoRR, abs/1909.06137, 2019. URL http://arxiv.org/abs/1909.06137.
- Understanding measures of uncertainty for adversarial example detection, 2018. URL https://arxiv.org/abs/1803.08533.
- Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. CoRR, abs/1710.10766, 2017. URL http://arxiv.org/abs/1710.10766.
- A survey on active learning strategy. In 2010 International Conference on Machine Learning and Cybernetics, volume 1, pp. 161–166, 2010. doi: 10.1109/ICMLC.2010.5581075.
- Intriguing properties of neural networks, 2013. URL https://arxiv.org/abs/1312.6199.
- Exploring the space of adversarial images. CoRR, abs/1510.05328, 2015. URL http://arxiv.org/abs/1510.05328.
- A boundary tilting persepective on the phenomenon of adversarial examples. CoRR, abs/1608.07690, 2016. URL http://arxiv.org/abs/1608.07690.
- Towards adversarially robust text classifiers by learning to reweight clean examples. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 1694–1707, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.134. URL https://aclanthology.org/2022.findings-acl.134.
- Are adversarial examples created equal? A learnable weighted minimax risk for robustness under non-uniform attacks. CoRR, abs/2010.12989, 2020. URL https://arxiv.org/abs/2010.12989.
- Generating textual adversarial examples for deep learning models: A survey. CoRR, abs/1901.06796, 2019. URL http://arxiv.org/abs/1901.06796.