Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations (2212.03095v2)

Published 30 Nov 2022 in cs.CV, cs.AI, cs.CR, cs.LG, and stat.ML

Abstract: Interpreting neural network classifiers using gradient-based saliency maps has been extensively studied in the deep learning literature. While the existing algorithms manage to achieve satisfactory performance in application to standard image recognition datasets, recent works demonstrate the vulnerability of widely-used gradient-based interpretation schemes to norm-bounded perturbations adversarially designed for every individual input sample. However, such adversarial perturbations are commonly designed using the knowledge of an input sample, and hence perform sub-optimally in application to an unknown or constantly changing data point. In this paper, we show the existence of a Universal Perturbation for Interpretation (UPI) for standard image datasets, which can alter a gradient-based feature map of neural networks over a significant fraction of test samples. To design such a UPI, we propose a gradient-based optimization method as well as a principal component analysis (PCA)-based approach to compute a UPI which can effectively alter a neural network's gradient-based interpretation on different samples. We support the proposed UPI approaches by presenting several numerical results of their successful applications to standard image datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
  2. “New types of deep neural network learning for speech recognition and related applications: An overview,” in 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013, pp. 8599–8603.
  3. “Predicting the sequence specificities of dna-and rna-binding proteins by deep learning,” Nature biotechnology, vol. 33, no. 8, pp. 831–838, 2015.
  4. “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  5. “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  6. “Interpretation of neural networks is fragile,” Association for the Advancement of Artificial Intelligence, 2019.
  7. “Fooling neural network interpretations via adversarial model manipulation,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  8. “Universal adversarial perturbations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1765–1773.
  9. “Generative adversarial perturbations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4422–4431.
  10. “Learning universal adversarial perturbations with generative models,” in 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 2018, pp. 43–49.
  11. “Art of singular vectors and universal adversarial perturbations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8562–8570.
  12. “Universal adversarial attack using very few test examples,” 2019.
  13. “Universal adversarial attacks on text classifiers,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 7345–7349.
  14. “Real-time, universal, and robust adversarial attacks against speaker recognition systems,” in ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2020, pp. 1738–1742.
  15. “Attack on practical speaker verification system using universal adversarial perturbations,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 2575–2579.
  16. “Efficient universal shuffle attack for visual object tracking,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 2739–2743.
  17. “How to explain individual classification decisions,” Journal of Machine Learning Research, vol. 11, pp. 1803–1831, Jun 2010.
  18. “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034, 2013.
  19. “Interpretation of neural networks is susceptible to universal adversarial perturbations,” arXiv preprint arXiv:2212.03095, 2022.
  20. Yann LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998.
  21. “Learning multiple layers of features from tiny images,” 2009.
  22. “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR09, 2009.
  23. “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  24. “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  25. “Universal adversarial training,” Association for the Advancement of Artificial Intelligence, 2020.
  26. “Stein’s lemma for elliptical random vectors,” Journal of Multivariate Analysis, vol. 99, no. 5, pp. 912–927, 2008.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets