Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection (2402.17018v1)

Published 26 Feb 2024 in cs.LG, cs.AI, and cs.CV

Abstract: We tested front-end enhanced neural models where a frozen classifier was prepended by a differentiable and fully convolutional model with a skip connection. By training them using a small learning rate for about one epoch, we obtained models that retained the accuracy of the backbone classifier while being unusually resistant to gradient attacks including APGD and FAB-T attacks from the AutoAttack package, which we attributed to gradient masking. The gradient masking phenomenon is not new, but the degree of masking was quite remarkable for fully differentiable models that did not have gradient-shattering components such as JPEG compression or components that are expected to cause diminishing gradients. Though black box attacks can be partially effective against gradient masking, they are easily defeated by combining models into randomized ensembles. We estimate that such ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10, CIFAR100, and ImageNet despite having virtually zero accuracy under adaptive attacks. Adversarial training of the backbone classifier can further increase resistance of the front-end enhanced model to gradient attacks. On CIFAR10, the respective randomized ensemble achieved 90.8$\pm 2.5$% (99% CI) accuracy under AutoAttack while having only 18.2$\pm 3.6$% accuracy under the adaptive attack. We do not establish SOTA in adversarial robustness. Instead, we make methodological contributions and further supports the thesis that adaptive attacks designed with the complete knowledge of model architecture are crucial in demonstrating model robustness and that even the so-called white-box gradient attacks can have limited applicability. Although gradient attacks can be complemented with black-box attack such as the SQUARE attack or the zero-order PGD, black-box attacks can be weak against randomized ensembles, e.g., when ensemble models mask gradients.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Advances in adversarial attacks and defenses in computer vision: A survey. CoRR, abs/2108.00401, 2021.
  2. Square attack: A query-efficient black-box adversarial attack via random search. In ECCV (23), volume 12368 of Lecture Notes in Computer Science, pp.  484–501. Springer, 2020.
  3. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, volume 80 of Proceedings of Machine Learning Research, pp.  274–283. PMLR, 2018.
  4. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognit., 84:317–331, 2018.
  5. Evasion attacks against machine learning at test time. In ECML/PKDD (3), volume 8190 of Lecture Notes in Computer Science, pp.  387–402. Springer, 2013.
  6. Gradient masking and the underestimated robustness threats of differential privacy in deep learning. CoRR, abs/2105.07985, 2021.
  7. Nicholas Carlini. A complete list of all (arxiv) adversarial example papers. https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html, 2019. Last accessed in September 2022: The list has been regularly updated at the moment of access.
  8. A survey on adversarial attacks and defences. CAAI Trans. Intell. Technol., 6(1):25–45, 2021.
  9. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, volume 119 of Proceedings of Machine Learning Research, pp.  2206–2216. PMLR, 2020a.
  10. Minimally distorted adversarial examples with a fast adaptive boundary attack. In ICML, volume 119 of Proceedings of Machine Learning Research, pp.  2196–2205. PMLR, 2020b.
  11. Robustbench: a standardized adversarial robustness benchmark. In NeurIPS Datasets and Benchmarks, 2021.
  12. Adversarial classification. In KDD, pp.  99–108. ACM, 2004.
  13. Imagenet: A large-scale hierarchical image database. In CVPR, pp.  248–255. IEEE Computer Society, 2009.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR. OpenReview.net, 2021.
  15. Explaining and harnessing adversarial examples. In ICLR (Poster), 2015.
  16. Deep residual learning for image recognition. In CVPR, pp.  770–778. IEEE Computer Society, 2016.
  17. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pp.  448–456. JMLR.org, 2015.
  18. Alex Krizhevsky et al. Learning multiple layers of features from tiny images, 2009.
  19. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  20. Is robustbench/autoattack a suitable benchmark for adversarial robustness? CoRR, abs/2112.01601, 2021.
  21. Adversarial learning. In KDD, pp.  641–647. ACM, 2005.
  22. Towards deep learning models resistant to adversarial attacks. In ICLR (Poster). OpenReview.net, 2018.
  23. Empirical robustification of pre-trained classifiers. In ICML 2021 Workshop on Adversarial Machine Learning, 2021.
  24. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15(1):1929–1958, 2014.
  25. Intriguing properties of neural networks. In ICLR (Poster), 2014.
  26. On adaptive attacks to adversarial example defenses. In NeurIPS, 2020.
  27. Robustness threats of differential privacy. CoRR, abs/2012.07828, 2020.
  28. Wide residual networks. In BMVC. BMVA Press, 2016.
  29. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process., 26(7):3142–3155, 2017.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com