Perceptual Adversarial Robustness: Defense Against Unseen Threat Models (2006.12655v4)

Published 22 Jun 2020 in cs.LG, cs.CV, and stat.ML

Abstract: A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial training defense with this property.

Authors (3)

Cassidy Laidlaw (13 papers)
Sahil Singla (66 papers)
Soheil Feizi (127 papers)

Citations (170)

View on Semantic Scholar

Summary

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

This paper tackles a significant challenge within the field of adversarial robustness by seeking to address the limitations of existing threat models. Traditional adversarial threat models rely on simplistic mathematical metrics such as $L_p$ distances which are insufficient in capturing the complexities of human visual perception. Consequently, adversarial defenses built on these metrics do not offer generalized robustness against varied, unforeseen adversarial attacks. To address this gap, the authors propose the Neural Perceptual Threat Model (NPTM), which aims to encompass all perceptibly minor adversarial examples through a neural network-based approximation called neural perceptual distance.

The authors validate their model using human perception benchmarks and employ it to innovate both adversarial attacks and defenses. They introduce the Lagrangian Perceptual Attack (LPA) and Perceptual Projected Gradient Descent (PPGD) as techniques to craft adversarial examples under the NPTM. These attacks considerably erode the defenses of current state-of-the-art classifiers when measured against unseen perturbations, notably reducing the accuracy of robust models to a mere fraction.

Intriguingly, they also develop Perceptual Adversarial Training (PAT) as a defense, which, without explicit training against specific threat models, shows remarkable generalization and robustness to multiple unforeseen adversarial attacks. In their evaluations on CIFAR-10 and ImageNet-100 datasets, PAT more than doubles robustness compared to the next best adversarial defense method, without even requiring direct exposure to these attacks during training. This enhanced robustness is vital for practical scenarios where comprehensive anticipation of threat models is implausible.

Moreover, the paper explores the perceptual validation of the introduced models and measures. A user paper demonstrating correlation between human perception and the LPIPS (Learned Perceptual Image Patch Similarity) metric underscores the reliability of their threat model.

Theoretically, this work offers a framework that better aligns machine perception with human perception, paving the path towards creating more robust AI systems. Practically, the incorporation of PAT into existing systems could drastically improve the general resilience against adversarial adversaries, a crucial need as AI models become prevalent across various domain applications, from autonomous driving to secure communications.

Future research stemming from this paper could explore extending perceptual modeling to non-visual domains or combining perceptual distance with other robustness measures for a holistic approach. Additionally, as adversarial strategies become more sophisticated, this work could serve as a foundation for formulating new adaptive defenses that inherently understand and model human perception within the AI training process.

PDF Markdown

Related Papers

GitHub

GitHub - cassidylaidlaw/perceptual-advex: Code and data for the ICLR 2021 paper "Perceptual Adversarial Robustness: Defense Against Unseen Threat Models". (55 stars)