Universal Adversarial Training (1811.11304v2)

Published 27 Nov 2018 in cs.CV, cs.CR, and cs.LG

Abstract: Standard adversarial attacks change the predicted class label of a selected image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13X faster than the standard method. To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game, and produces robust models with only 2X the cost of natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation, which allows us to do universal adversarial training on ImageNet.

Citations (183)

View on Semantic Scholar

Summary

The paper presents universal adversarial perturbation generation as a constrained optimization problem, achieving over 13x faster computation than previous methods.
The paper introduces a two-player min-max adversarial training framework that robustly defends against both universal and per-instance attacks.
The paper demonstrates that low-cost universal adversarial training maintains high natural accuracy while significantly enhancing resistance to adversarial examples.

Universal Adversarial Training

The paper "Universal Adversarial Training" by Ali Shafahi et al. explores the vulnerabilities of deep neural networks (DNNs) to adversarial perturbations and introduces a novel method for efficiently generating and defending against universal adversarial perturbations. Unlike standard adversarial attacks, which generate perturbations specific to individual images, universal perturbations can be applied to any image, affecting broad classes across datasets.

Key Contributions

Universal Adversarial Perturbations as an Optimization Problem: The authors present universal perturbation generation as a constrained optimization problem, utilizing a stochastic gradient method with a "clipped" loss function to mitigate the risk of any single image dominating the loss. This method results in significant efficiency improvements, generating universal perturbations over 13 times faster than previous state-of-the-art methods while maintaining high attack efficacy on network architectures like InceptionV1 and VGG16.
Universal Adversarial Training: To defend against these attacks, the authors propose a method where robust classifier training is framed as a two-player min-max optimization problem, thus efficiently hardening networks against such perturbations. The authors suggest an alternating stochastic gradient method for adversarial training that is computationally feasible, even for large datasets like ImageNet.
Low-Cost Universal Adversarial Training: A simultaneous stochastic gradient method is introduced, offering robust adversarial training at a cost nearly equivalent to natural training. This approach greatly reduces the computational burden associated with traditional adversarial training, making it more accessible for deployment on large-scale datasets.

Experimental Results

The experiments on CIFAR-10 and ImageNet demonstrate the efficacy of the proposed methods, where universality of perturbations is tested and validated across multiple network architectures. The universally trained models not only show resilience to universal perturbations but also exhibit enhanced robustness against per-instance adversarial attacks like FGSM and PGD, despite the latter being inherently different from the universal attacks the models were originally trained against. The results show that this training paradigm maintains a high degree of model accuracy on clean data while substantially improving resistance to adversarial examples.

Implications and Future Work

The implications of this paper extend to both theoretical and practical aspects of AI safety. On a theoretical level, the authors challenge the prevailing notion that adversarial training must be costly, providing evidence that robust models can be achieved with efficient universal adversarial training strategies. Practically, the introduction of low-cost training methods enables broader implementation in real-world applications, especially where computational resources are limited.

Looking forward, the exploration of universal adversarial training provides fertile ground for further research into understanding the transferability of perturbations across models and data distributions, enhancing robustness in varied deployment scenarios. Moreover, it invites further examination of the interplay between universality and specificity in adversarial attacks and defenses, potentially guiding the development of more comprehensive security frameworks for neural networks.

In conclusion, this work advances the state-of-the-art in adversarial learning by proposing efficient methodologies for both attacking and defending against universal perturbations, thus offering valuable insights and tools for creating more resilient neural networks.

PDF Markdown