Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification (1709.05583v4)

Published 17 Sep 2017 in cs.CR, cs.LG, and stat.ML

Abstract: Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppose we have a testing example, whose label can be correctly predicted by a DNN classifier. An attacker can add a small carefully crafted noise to the testing example such that the DNN classifier predicts an incorrect label, where the crafted testing example is called adversarial example. Such attacks are called evasion attacks. Evasion attacks are one of the biggest challenges for deploying DNNs in safety and security critical applications such as self-driving cars. In this work, we develop new methods to defend against evasion attacks. Our key observation is that adversarial examples are close to the classification boundary. Therefore, we propose region-based classification to be robust to adversarial examples. For a benign/adversarial testing example, we ensemble information in a hypercube centered at the example to predict its label. In contrast, traditional classifiers are point-based classification, i.e., given a testing example, the classifier predicts its label based on the testing example alone. Our evaluation results on MNIST and CIFAR-10 datasets demonstrate that our region-based classification can significantly mitigate evasion attacks without sacrificing classification accuracy on benign examples. Specifically, our region-based classification achieves the same classification accuracy on testing benign examples as point-based classification, but our region-based classification is significantly more robust than point-based classification to various evasion attacks.

Authors (2)

Xiaoyu Cao (32 papers)
Neil Zhenqiang Gong (117 papers)

Citations (205)

View on Semantic Scholar

Summary

Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification

The paper "Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification" by Xiaoyu Cao and Neil Zhenqiang Gong addresses one of the key vulnerabilities in deep neural networks (DNNs): their susceptibility to adversarial examples. These adversarial manipulations pose significant challenges when deploying DNNs in security-critical applications like self-driving cars.

The crux of the authors’ approach is the development of a novel defensive mechanism termed region-based classification (RC). Traditional classifiers rely on point-based predictions, assessing each input example individually to determine its label. This point-based approach is notably vulnerable as adversarial examples are strategically generated close to decision boundaries—small perturbations can lead them to be classified incorrectly.

To counteract this, the proposed region-based classification evaluates an ensemble of predictions over a region. Specifically, for each input, the RC samples multiple data points from a hypercube centered at the input in the input space. These sampled data points are individually classified using an existing DNN. The final label for the input example is determined through majority voting among the sampled data points' predicted labels. This method exploits the insight that adversarial examples tend to reside at or near decision boundaries, while benign examples inhabit deeper within class regions.

The research outlines the effectiveness of this RC method by showing that it maintains classification accuracy on benign examples comparable to point-based classification. Moreover, experiments conducted on the MNIST and CIFAR-10 datasets led to significant robustness improvements against several evasion attacks, specifically detailing results against targeted and untargeted attacks, including those employing sophisticated methods like the attacks by Carlini and Wagner. Numerical results indicated less than 20% success rate for these attacks on MNIST and under 7% on CIFAR-10 when using RC, as opposed to the near 100% success rate observed with traditional classification approaches.

The implications of this research are manifold. Practically, the deployment of RC can greatly enhance the security and reliability of AI systems in fields where erroneous classification due to attacks could yield catastrophic consequences. Theoretically, this work advances the understanding of adversarial robustness and exposes new dimensions for refining attack strategies and crafting more resilient defenses.

Looking forward, the research directs attention to potential extensions, including the exploration of alternative regional shapes beyond hypercubes and the further enhancement of the ensembling mechanism. Such work would contribute to the broader discourse on improving the adversarial robustness of machine learning models—a critical concern as the integration of AI systems into integral societal functions accelerates.

In summary, this paper offers a comprehensive framework for understanding and mitigating the impact of adversarial attacks on DNNs without compromising their performance on normal scenarios, paving the way for safer AI deployments.

PDF Markdown

Related Papers

Find Related Papers