Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks (1704.01155v2)

Published 4 Apr 2017 in cs.CV, cs.CR, and cs.LG

Abstract: Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by \emph{adversarial examples} that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.

Citations (1,074)

View on Semantic Scholar

Summary

The paper presents Feature Squeezing as a defense mechanism that reduces input complexity to detect adversarial perturbations.
It employs bit depth reduction and spatial smoothing to mitigate subtle attack-induced noise, achieving up to 98% detection on MNIST.
Experimental results on MNIST, CIFAR-10, and ImageNet demonstrate robust detection with low false positives, enabling real-time integration.

Analyzing Feature Squeezing as a Defense Mechanism Against Adversarial Examples in Deep Learning

The paper "Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks" introduces a practical technique for improving the robustness of Deep Neural Networks (DNNs) against adversarial examples. The authors propose a method called "Feature Squeezing," which aims to reduce the attack surface by coalescing multiple feature vectors before feeding them into the network.

Core Concept

Feature Squeezing refers to the process of reducing the complexity of feature representations by reducing the input dimensions. The authors focus on two primary methods for squeezing features: bit depth reduction and spatial smoothing.

Bit Depth Reduction: Reduces the color depth of images by mapping pixel values to a smaller set. This process is designed to eliminate subtle perturbations that adversarial attacks typically introduce.
Spatial Smoothing: Applies a smoothing filter (e.g., median filter) to remove noise and perturbations from the input image. This technique helps mitigate the influence of fine-grained adversarial perturbations.

These techniques are chosen for their simplicity and computational efficiency, which make them suitable for real-time applications.

Methodology

The authors evaluate the effectiveness of feature squeezing against adversarial attacks using several well-established datasets, including MNIST, CIFAR-10, and ImageNet. They measure the ability of these squeezing techniques to detect adversarial examples generated by various attack methods such as FGSM and C&W.

The evaluation metrics focus on:

Detection rate: The proportion of adversarial examples correctly identified as malicious.
False positive rate: The proportion of benign inputs incorrectly flagged as adversarial.

Experimental Results

The paper reports robust detection rates across all tested scenarios. For example, applying bit depth reduction to MNIST images yields a detection rate of 98% against FGSM attacks while maintaining a low false positive rate. Similarly, spatial smoothing demonstrates 91% detection accuracy for CIFAR-10 under the same attack conditions.

Implications

The implications of this research are multi-faceted:

Practical Applications: Feature Squeezing can be integrated into existing DNN pipelines with minimal overhead, thereby offering an effective first line of defense against adversarial attacks.
Theoretical Considerations: The findings suggest that reducing input complexity could inherently make models more resistant to adversarial perturbations, a hypothesis that warrants further exploration in future research.
Expandability: The techniques proposed can be combined with other defense mechanisms, potentially leading to even more robust DNNs.

Future Developments

Future research could investigate the combination of feature squeezing with other advanced adversarial defense strategies, such as adversarial training and model ensembling. Additionally, theoretical work to formalize the reasons behind the success of feature squeezing could provide deeper insights into the nature of adversarial vulnerabilities in DNNs.

In conclusion, the paper provides a compelling argument for the use of feature squeezing as a simple yet effective defense mechanism against adversarial attacks, opening the door for further investigations and applications in the domain of secure deep learning.

PDF Markdown