Delving into Transferable Adversarial Examples and Black-box Attacks (1611.02770v3)

Published 8 Nov 2016 in cs.LG

Abstract: An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.

PDF Abstract

Delving into Transferable Adversarial Examples and Black-box Attacks

Introduction

The phenomenon of adversarial examples in deep neural networks (DNNs) poses significant challenges to the deployment of these models in security-critical applications. Such adversarial examples are inputs crafted to force a model into making incorrect predictions, and intriguingly, these adversarial inputs can often transfer between different architectures. This transferability is particularly concerning for black-box models where the architecture and parameters are unknown to the attacker. The paper "Delving into Transferable Adversarial Examples and Black-box Attacks" by Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song, addresses these challenges by conducting a comprehensive paper on the transferability of adversarial examples across large-scale datasets and models, and introducing novel ensemble-based methods to enhance transferability.

Contributions

The paper provides several key contributions:

Extensive Evaluation on Large-scale Datasets: The paper is the first to evaluate transferability over large-scale datasets and state-of-the-art models. Prior work primarily focuses on smaller datasets such as MNIST and CIFAR-10.
Targeted Adversarial Examples: It investigates the transferability of both non-targeted and targeted adversarial examples, revealing that while non-targeted examples are easily transferable, targeted adversarial examples generated with existing methods largely fail to transfer their target labels.
Novel Ensemble-based Approaches: The authors propose new ensemble-based methods to generate adversarial examples. These methods significantly improve the transferability of targeted adversarial examples.
Geometric Analysis: The paper includes a geometric analysis of the models to understand why adversarial examples transfer, demonstrating that the gradient directions of different models are nearly orthogonal, yet their decision boundaries align well.
Black-box Attacks: It showcases successful black-box attacks on Clarifai.com, a commercial image classification service, using the devised ensemble-based approaches without querying the black-box system for training a substitute model.

Methodology

The paper explores several strategies for generating adversarial examples:

Optimization-based Approach: This method optimizes for adversarial perturbations under specified distance constraints using the Adam optimizer.
Fast Gradient Sign (FGS) and Fast Gradient (FG) Methods: These methods construct adversarial examples using gradients and their signs, allowing for rapid generation but constrained to a single direction in the input space.

Both non-targeted and targeted variants of these methods are evaluated on large models trained on ImageNet, including ResNet-50, ResNet-101, ResNet-152, VGG-16, and GoogLeNet.

Results

Non-targeted Attacks

Optimization-based Approach: Demonstrates high transferability with substantial proportions of adversarial instances generated for one model successfully misleading other models, including those with different architectures.
Fast Gradient Methods: Although capable of generating adversarial examples efficiently, their transferability is slightly lower than that of the optimization-based approach.

Targeted Attacks

Existing Methods: Targeted adversarial examples typically fail to transfer their intended labels to other models, even with increased distortion.
Ensemble-based Approach: Introduced as a novel strategy, significantly enhances the transferability of targeted examples. This success is attributed to simultaneous generation of adversarial examples across multiple models, aligning their decision boundaries toward a single targeted label.

Implications

The demonstrated transferability of adversarial examples suggests a need for robust defenses in DNN-based applications, especially those involving black-box models. The proposed ensemble-based approaches could serve as a double-edged sword: enhancing adversarial capability on one hand, yet shedding light on potential defensive strategies by understanding how and why such examples transfer across models.

Future Directions

Future research could explore:

Adaptive Defenses: Developing models that are resilient to adversarial perturbations, possibly through ensemble learning or robust optimization techniques.
Better Understanding of Transferability: Further geometric and theoretical analyses might reveal deeper insights into the structural properties of DNNs that facilitate adversarial transferability.
Extended Black-box Testing: Evaluating the efficacy of adversarial techniques on a broader range of commercial and real-world black-box systems.

The findings and methods presented in the paper underscore the critical interplay between adversarial example generation and the robustness of neural networks, urging continued investigation into both offensive and defensive AI strategies.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Yanpei Liu (10 papers)
Xinyun Chen (80 papers)
Chang Liu (864 papers)
Dawn Song (229 papers)

Citations (1,664)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - sunblaze-ucb/transferability-advdnn-pub: Public repo for transferability ICLR 2017 paper (52 stars)