Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics (1612.07767v2)

Published 22 Dec 2016 in cs.CV

Abstract: Deep learning has greatly improved visual recognition in recent years. However, recent research has shown that there exist many adversarial examples that can negatively impact the performance of such an architecture. This paper focuses on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. Instead of directly training a deep neural network to detect adversarials, a much simpler approach was proposed based on statistics on outputs from convolutional layers. A cascade classifier was designed to efficiently detect adversarials. Furthermore, trained from one particular adversarial generating mechanism, the resulting classifier can successfully detect adversarials from a completely different mechanism as well. The resulting classifier is non-subdifferentiable, hence creates a difficulty for adversaries to attack by using the gradient of the classifier. After detecting adversarial examples, we show that many of them can be recovered by simply performing a small average filter on the image. Those findings should lead to more insights about the classification mechanisms in deep convolutional neural networks.

Citations (353)

View on Semantic Scholar

Summary

The paper presents a novel adversarial detection method using convolutional filter statistics and a cascade classifier to robustly separate adversarial from normal inputs.
It demonstrates high accuracy and AUC improvements on architectures like AlexNet and VGG-16, outperforming state-of-the-art methods such as OpenMax.
The study proposes a recovery method using average filtering, highlighting potential network design improvements for enhanced robustness in critical applications.

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

Adversarial examples pose a significant challenge to the robustness of deep learning models, particularly in visual recognition tasks. This paper presents a novel approach to detecting adversarial examples in deep networks by leveraging convolutional filter statistics. Unlike traditional methods that directly train a neural network for adversarial detection, this work proposes a simpler, yet effective approach by using a cascade classifier based on statistical features derived from convolutional layers.

The paper outlines a methodology that identifies whether an input sample is adversarial by examining its distribution similarities with normal examples. This is achieved through a discriminative method that utilizes convolutional filter outputs, which are generally more informative for this purpose in high-dimensional spaces. The approach bypasses the limitations of treating deep networks as black boxes, as seen in many prior outlier detection methods.

A striking feature of the proposed detection mechanism is its robustness. The classifier trained on adversarial examples generated by one method successfully detects adversarials from different generation mechanisms. Moreover, the classifier’s reliance on non-subdifferentiable statistics impedes adversarial attacks exploiting gradient information, adding a layer of security against gradient-based methods.

Quantitative results demonstrate the effectiveness of the approach, showing precise separation between normal and adversarial examples in both AlexNet and VGG-16 architectures. The classifier achieves high accuracy and AUC values, outperforming state-of-the-art techniques such as OpenMax. A pivotal aspect of the research is its generalization capability, where the classifier trained on one type of adversarial example effectively identifies others without needing re-training.

In addition to detection, the paper highlights a simple yet impactful recovery method. By applying an average filter on detected adversarial images, many correct predictions are restored, hinting at the over-sensitivity of current networks to minor perturbations. This suggests that future models could benefit from larger receptive fields, reducing susceptibility to such adversarial influences.

The implications of this research extend beyond immediate practical applications. By advancing adversarial detection methods, safety and reliability in critical fields—such as autonomous driving and security systems—are enhanced. Moreover, the insights about convolutional layer statistics could inform the design of more robust learning algorithms, possibly driving innovations in self-aware learning models capable of greater autonomy and reliability.

Looking forward, integrating this detection strategy with generative adversarial networks (GANs) could lead to a symbiotic development cycle, enhancing both adversarial generation and detection methods. Moreover, tackling the challenge of classification without heavy reliance on softmax layers may further mitigate adversarial vulnerabilities.

In conclusion, this paper contributes a measurable advancement in adversarial example detection, utilizing convolutional filter statistics to deliver a robust, non-subdifferentiable classifier. The promising results indicate a significant step towards safer and more reliable deployment of deep learning in sensitive applications, with potential theoretical contributions to self-aware learning and neural network resilience.

PDF Markdown

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics (1612.07767v2)

Summary

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

Related Papers