Investigating Human-Identifiable Features Hidden in Adversarial Perturbations (2309.16878v1)
Abstract: Neural networks perform exceedingly well across various machine learning tasks but are not immune to adversarial perturbations. This vulnerability has implications for real-world applications. While much research has been conducted, the underlying reasons why neural networks fall prey to adversarial attacks are not yet fully understood. Central to our study, which explores up to five attack algorithms across three datasets, is the identification of human-identifiable features in adversarial perturbations. Additionally, we uncover two distinct effects manifesting within human-identifiable features. Specifically, the masking effect is prominent in untargeted attacks, while the generation effect is more common in targeted attacks. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models. In addition, our findings indicate a notable extent of similarity in perturbations across different attack algorithms when averaged over multiple models. This work also provides insights into phenomena associated with adversarial perturbations, such as transferability and model interpretability. Our study contributes to a deeper understanding of the underlying mechanisms behind adversarial attacks and offers insights for the development of more resilient defense strategies for neural networks.
- Square attack: A query-efficient black-box adversarial attack via random search for the l2 norm. In International Conference on Machine Learning, 2020.
- Synthesizing robust adversarial examples. CoRR, 2017.
- Adversarial patch. arXiv, 2017.
- Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
- ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009.
- Li Deng. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Adversarial examples that fool both computer vision and time-limited humans. In Advances in Neural Information Processing Systems, 2018.
- Robust physical-world attacks on deep learning visual classification. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Large-scale unsupervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- Interpreting adversarial examples in deep learning: A review. Association for Computing Machinery Computing Surveys, 2023.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29:82–97, 2012.
- Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems, 2019.
- Hoki Kim. Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950, 2020.
- Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, 2009.
- Adversarial examples in the physical world. arXiv, 2016.
- Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017.
- Deepfool: A simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv, 2016.
- Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Association for the Advancement of Artificial Intelligence Conference, 2018.
- Image synthesis with a single (robust) classifier. In Advances in Neural Information Processing Systems, 2019.
- Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, 2018.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision, 2017.
- Are adversarial examples inevitable? In International Conference on Learning Representations, 2019.
- First-order adversarial vulnerability of neural networks and input dimension. In International Conference on Machine Learning, 2019.
- Very deep convolutional networks for large-scale image recognition. arXiv, 2014.
- Smoothgrad: Removing noise by adding noise. In Workshop on Visualization for Deep Learning, 2017.
- One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23:828–841, 2019.
- Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
- Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Oleg Sémery. Computer vision models on PyTorch. https://github.com/osmr/imgclsmob, 2018.
- A boundary tilting perspective on the phenomenon of adversarial examples. arXiv, 2016.
- The space of transferable adversarial examples. arXiv, 2017.
- Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.
- Towards transferable targeted adversarial examples. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- Rethinking lipschitz neural networks and certified robustness: A boolean function perspective. In Advances in Neural Information Processing Systems, 2022.
- Dennis Y. Menn (2 papers)
- Tzu-hsun Feng (6 papers)
- Sriram Vishwanath (131 papers)
- Hung-yi Lee (327 papers)