Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations (1801.08092v3)

Published 24 Jan 2018 in cs.CV, cs.AI, and cs.LG

Abstract: Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbations. Independent of the underlying task, our objective achieves fooling via corrupting the extracted features at multiple layers. Therefore, the proposed objective is generalizable to craft image-agnostic perturbations across multiple vision tasks such as object recognition, semantic segmentation, and depth estimation. In the practical setting of black-box attack scenario (when the attacker does not have access to the target model and it's training data), we show that our objective outperforms the data dependent objectives to fool the learned models. Further, via exploiting simple priors related to the data distribution, our objective remarkably boosts the fooling ability of the crafted perturbations. Significant fooling rates achieved by our objective emphasize that the current deep learning models are now at an increased risk, since our objective generalizes across multiple tasks without the requirement of training data for crafting the perturbations. To encourage reproducible research, we have released the codes for our proposed algorithm.

Authors (3)

Konda Reddy Mopuri (19 papers)
Aditya Ganeshan (15 papers)
R. Venkatesh Babu (108 papers)

Citations (198)

View on Semantic Scholar

Summary

Overview of Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations

The paper "Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations" by Mopuri et al. introduces a novel method for generating Universal Adversarial Perturbations (UAPs) that do not rely on the availability or use of any data samples from the target model's training distribution. The approach, termed GD-UAP, marks a significant shift from traditional data-dependent methodologies that craft UAPs with direct reliance on the training data.

Methodology and Contributions

The primary contribution lies in crafting image-agnostic perturbations through a generalizable, data-free objective that aims to perturb activations at multiple layers of a target Convolutional Neural Network (CNN). This is achieved by corrupting the extracted features across network layers, effectively challenging the stability of representations within the neural network without specific data.

Key innovative aspects of the GD-UAP approach include:

Data-Free Objective: Unlike existing methods that are task-specific and depend on a large set of training images, GD-UAP formulates an objective independent of any image data. It focuses on maximizing perturbation across network layers to induce spurious activations, emphasizing model vulnerability by altering deep feature representations.
Exploitation of Minimal Priors: Though data-independent, the GD-UAP framework can leverage minimal prior information such as the input mean and dynamic range, or even limited target data, to enhance perturbation efficacy. This approach demarcates the boundary between using direct data and indirect data-driven priors effectively.
Versatility Across Tasks: GD-UAP demonstrates empirical robustness across tasks including image recognition, semantic segmentation, and depth estimation. This cross-task generalization showcases the framework's adaptability to impact performance metrics pertinent to tasks beyond classification, including regression tasks typically unaffected by adversarial perturbations.
Comprehensive Evaluation and Analysis: The paper includes extensive experiments on models trained on datasets such as ILSVRC, Places-205, Pascal VOC, and KITTI. These experiments compare GD-UAP against both random noise baselines and existing perturbation methods, underscoring the significant, data-independent fooling capability of GD-UAP.

Numerical Results and Observations

Experiments conducted reveal robust performance of GD-UAP perturbations. For instance, in a white-box attack scenario on ILSVRC, GD-UAP achieves a mean fooling rate of approximately 69.24%, demonstrating significant viability despite the data-free context. Contrastingly, data-dependent approaches show diminished performance when deviating from their necessary data-driven environments, validating GD-UAP's robustness in scenarios where traditional data integrity assumptions fail.

Furthermore, the paper highlights the methodology’s efficacy in black-box attack scenarios, as GD-UAP perturbations maintain competitive fooling rates, showcasing the potential risk neural networks face in deployment when subject to such attacks.

Implications and Speculations

The introduction of GD-UAP underlines the pertinence of evaluating deep learning models' susceptibility in the absence of data, posing a multifaceted challenge to the commonly employed data-dependent adversarial frameworks. The method fosters a new line of inquiry about the stability of CNN architectures, irrespective of task-specific designs. It emphasizes a growing requirement to reevaluate security assumptions regarding models that might seem robust under traditional data-driven adversarial frameworks but exhibit vulnerabilities when data is not directly accessible or used for perturbation crafting.

This paper’s contributions reverberate through both theoretical and practical domains within machine learning, pushing the boundaries on protecting models in unspecified data environments—central to the field's evolution towards secure and reliable AI deployment aligned with real-world conditions. As AI applications continue to permeate sectors requiring high reliability, understanding adversary designs devoid of training data accelerates research into comprehensive, foolproof defenses.

The release of source codes for GD-UAP encourages reproducibility and further exploration, inviting the research community to build upon these findings to enhance the security and robustness of machine learning systems against universal adversarial perturbations.

PDF Markdown