Fast Feature Fool: A data independent approach to universal adversarial perturbations (1707.05572v1)

Published 18 Jul 2017 in cs.CV

Abstract: State-of-the-art object recognition Convolutional Neural Networks (CNNs) are shown to be fooled by image agnostic perturbations, called universal adversarial perturbations. It is also observed that these perturbations generalize across multiple networks trained on the same target data. However, these algorithms require training data on which the CNNs were trained and compute adversarial perturbations via complex optimization. The fooling performance of these approaches is directly proportional to the amount of available training data. This makes them unsuitable for practical attacks since its unreasonable for an attacker to have access to the training data. In this paper, for the first time, we propose a novel data independent approach to generate image agnostic perturbations for a range of CNNs trained for object recognition. We further show that these perturbations are transferable across multiple network architectures trained either on same or different data. In the absence of data, our method generates universal adversarial perturbations efficiently via fooling the features learned at multiple layers thereby causing CNNs to misclassify. Experiments demonstrate impressive fooling rates and surprising transferability for the proposed universal perturbations generated without any training data.

Citations (198)

View on Semantic Scholar

Summary

The paper introduces a data independent approach that generates universal adversarial perturbations by maximizing CNN feature activations without relying on training data.
The method exploits CNN architectural traits to achieve competitive fooling rates across models such as CaffeNet, VGG, and GoogLeNet.
Empirical results demonstrate rapid convergence and strong cross-network transferability, highlighting new directions for adversarial research and model security.

An Overview of "Fast Feature Fool: A Data Independent Approach to Universal Adversarial Perturbations"

The paper "Fast Feature Fool: A data independent approach to universal adversarial perturbations" represents a methodical effort to address a core challenge in the field of adversarial attacks on CNNs: the dependency on training data typically required to generate effective adversarial perturbations. The authors introduce an innovative approach that circumvents the need for access to such data when creating universal adversarial perturbations (UAPs).

The traditional methods for creating UAPs have demonstrated that once perturbed to a certain direction, images from the training distribution can consistently mislead CNNs. However, these methods depend heavily on having substantial access to the training data to optimize the perturbation. The paper critiques this dependency, which is impractical for real-world adversarial scenarios where access to the training data of the target model is typically restricted.

The authors propose a "data independent" mechanism to generate UAPs—referring to these as the "Fast Feature Fool" algorithm. This approach leverages the inherent characteristics of CNN architectures by manipulating the feature activations across multiple layers without requiring data samples from the training distribution. Specifically, the method aims to maximize the mean activations post-ReLU across convolutional layers simultaneously, optimizing the perturbation without any prior access to training images. Convergence is achieved comparatively swiftly, presenting computational advantages over data-dependent methodologies.

Empirical results illustrate the effectiveness and transferability of these data-independent perturbations, with fooling rates demonstrating robustness across multiple architectures trained on the ILSVRC dataset, including CaffeNet, VGG variants, and GoogLeNet. Notably, while dedicated data-dependent techniques such as Moosavi-Dezfooli et al.'s method showed higher fooling rates with substantial data, the "Fast Feature Fool" perturbations maintained competitiveness without any dataset-specific tweaks, showcasing substantial cross-network transferability.

Further extending the evaluation, the experiments delve into the perturbations' transferability across distinct data distributions—a typically underexplored aspect—and report notable generalization even when tested on networks trained on different datasets, such as Places-205. This broad applicability reinforces the potential for data-independent methods in adversarial contexts, suggesting pathways for exploration in AI robustness and security.

In summary, this paper provides a significant shift in the paper of adversarial perturbations by illustrating that effective fooling can occur even in the absence of data, emphasizing a novel direction for future adversarial research. This method not only enhances our understanding of CNN vulnerabilities but also poses new questions regarding the architectural dependencies of adversarial examples, suggesting a profound need for continued exploration into building fortified CNNs against such universal threats. Future research could build on these insights to develop more secure models, potentially incorporating adaptive defenses that can withstand even data-independent adversarial strategies.

PDF Markdown

Related Papers

GitHub

GitHub - val-iisc/fast-feature-fool: Data independent universal adversarial perturbations (62 stars)