Spatially Transformed Adversarial Examples (1801.02612v2)

Published 8 Jan 2018 in cs.CR, cs.CV, and stat.ML

Abstract: Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large $\mathcal{L}_p$ distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.

Authors (6)

Chaowei Xiao (110 papers)
Jun-Yan Zhu (80 papers)
Bo Li (1107 papers)
Warren He (8 papers)
Mingyan Liu (70 papers)
Dawn Song (229 papers)

Citations (505)

View on Semantic Scholar

Summary

Spatially Transformed Adversarial Examples: An Overview

In the evolving field of deep neural networks (DNNs), adversarial robustness remains a critical area of research. This paper addresses a novel approach to generating adversarial examples by employing spatial transformations instead of traditional pixel manipulation methods.

Adversarial Vulnerability in DNNs

DNNs have made significant strides across various domains such as image processing, text analysis, and speech recognition. Despite these advancements, they are susceptible to adversarial examples—subtle inputs designed to mislead models. Traditional methods for generating such examples focus on modifying pixel values within a specific $\mathcal{L}_p$ norm to ensure minimal perceptual change, yet have limitations related to perceptual irrelevance due to lighting and viewpoint changes.

Introduction to Spatially Transformed Adversarial Examples

The authors propose an innovative method to generate adversarial examples via spatial transformations. This technique involves altering the spatial arrangement of pixels rather than their values, resulting in more perceptually indistinguishable perturbations. The paper formulates this approach mathematically using a flow field to dictate pixel displacements, maintaining high perceptual realism.

Experimental Insights and Results

The paper provides extensive experiments demonstrating stronger resilience of spatially transformed adversarial examples against existing defenses compared to traditional methods. Key datasets such as MNIST, CIFAR-10, and an ImageNet-compatible set were used for evaluation. The results indicate high attack success rates across various models, highlighting the method's efficacy.

The visualizations included in the paper depict smooth and localized transformations. For instance, targeted adversarial examples maintain the original instance's perceptual identity while successfully misleading the classifier.

Implications for Defense Mechanisms

Current defense mechanisms, including adversarial training methods such as FGSM-based and PGD-based approaches, struggle against these novel adversarial examples. This suggests a necessity for the development of new defense strategies capable of handling spatially transformed perturbations.

Theoretical and Practical Implications

The introduction of spatial transformations in adversarial attacks challenges conventional notions of perturbation bounded by pixel value changes. This has significant implications for adversarial theory, encouraging a shift towards geometric considerations in both attack and defense strategies. Practically, it necessitates reevaluation of adversarial robustness in legal and ethical contexts, where indistinguishable adversarial examples pose risks to security-critical applications.

Future Directions

Future developments may involve exploring hybrid approaches that combine both pixel and spatial transformations. Additionally, understanding the interplay between spatial transformations and model architectures could inform the design of inherently robust DNNs. Research could also extend into adaptive defense mechanisms that predict and counteract potential spatial distortions effectively.

In conclusion, the paper presents a compelling case for a new class of adversarial examples using spatial transformations, urging the research community to rethink strategies for enhancing DNN robustness. The methodological shift introduces opportunities to explore new dimensions in both adversarial example generation and defense mechanisms.