Adversarial Perturbations Against Deep Neural Networks for Malware Classification (1606.04435v2)

Published 14 Jun 2016 in cs.CR, cs.LG, and cs.NE

Abstract: Deep neural networks, like many other machine learning models, have recently been shown to lack robustness against adversarially crafted inputs. These inputs are derived from regular inputs by minor yet carefully selected perturbations that deceive machine learning models into desired misclassifications. Existing work in this emerging field was largely specific to the domain of image classification, since the high-entropy of images can be conveniently manipulated without changing the images' overall visual appearance. Yet, it remains unclear how such attacks translate to more security-sensitive applications such as malware detection - which may pose significant challenges in sample generation and arguably grave consequences for failure. In this paper, we show how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers. The application domain of malware classification introduces additional constraints in the adversarial sample crafting problem when compared to the computer vision domain: (i) continuous, differentiable input domains are replaced by discrete, often binary inputs; and (ii) the loose condition of leaving visual appearance unchanged is replaced by requiring equivalent functional behavior. We demonstrate the feasibility of these attacks on many different instances of malware classifiers that we trained using the DREBIN Android malware data set. We furthermore evaluate to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification. While feature reduction did not prove to have a positive impact, distillation and re-training on adversarially crafted samples show promising results.

Authors (5)

Kathrin Grosse (22 papers)
Nicolas Papernot (123 papers)
Praveen Manoharan (7 papers)
Michael Backes (157 papers)
Patrick McDaniel (70 papers)

Citations (410)

View on Semantic Scholar

Summary

Adversarial Perturbations Against Deep Neural Networks for Malware Classification

This paper investigates the susceptibility of deep neural networks (DNNs) used for malware classification to adversarial perturbations. While previous research on adversarial attacks has primarily focused on domains such as image classification, this work extends the exploration to the more challenging and security-sensitive field of malware detection. The authors aim to demonstrate that DNNs, commonly praised for their performance in high-dimensional space classification, are not immune to adversarial manipulations when applied to the malware classification paradigm.

Core Contributions

The work primarily presents two core contributions:

Adaptation of Adversarial Crafting to Malware Classification: The paper designs a method for crafting adversarial samples aimed at misguiding malware classifiers built on neural networks. Unlike previous work in image domain adversariality, this paper deals with discrete, binary inputs (indicating, for instance, whether a system call is present) rather than continuous pixel values. Crafting techniques originally developed for image processing are adapted to address these discrete challenges while maintaining the functionality of the original applications. Remarkably, the crafting method achieved up to an 85% misclassification rate on malware samples from the DREBIN dataset.
Exploration of Defensive Mechanisms: The authors investigate potential strategies to enhance the robustness of neural networks against adversarial examples. These include feature reduction techniques, distillation, and adversarial re-training. While the paper observes that feature reduction does not confer any significant protection and can even increase vulnerability, distillation and adversarial re-training show more promise in reducing misclassification rates caused by adversarial samples.

Experimental Setup

The empirical evaluation employs DNN classifiers trained on the DREBIN dataset, a widely-used Android malware dataset. Various experimental configurations are tested, including different network architectures and feature sets, to optimize the malware detection performance and assess vulnerability to crafted adversarial examples. The neural networks, configured with several hidden layers, achieve high classification accuracy close to 98% under typical conditions, but this performance drops significantly when adversarial samples are introduced.

Numerical Results and Findings

The paper presents robust numerical results, indicating the effectiveness of adversarial attacks in reducing classifier reliability. A substantial 85% of the adversarially perturbed samples led to incorrect benign classification in some cases. Furthermore, training the classifier with adversarial examples included as a part of the dataset (adversarial re-training) helps attenuate this misclassification rate. However, such improvements come at a cost of significantly increasing the false negative rates, highlighting a trade-off between adversarial robustness and legitimate detection accuracy.

Implications and Future Directions

This research has significant implications in the domain of cybersecurity and the development of robust malware detection systems. The evidence presented suggests that DNNs must be cautiously deployed in security-critical applications unless appropriate defenses against adversarial examples are in place. This includes further exploration into adaptable learning strategies that can dynamically respond to adversarial threats.

For the future, a deeper understanding of the fundamental limitations of DNNs in adversarially rich environments is necessary. Continued research could involve developing more sophisticated models that inherently resist adversarial perturbations and discovering new domains where these methodologies could be impactful. The discrepancy in defense technique effectiveness across different domains also warrants further investigation.

In conclusion, while the vulnerabilities exposed in the paper highlight significant weaknesses of current DNN approaches to malware classification, they also pave the way for developing more resilient and secure machine learning models in cybersecurity applications.

PDF Markdown

Related Papers

Find Related Papers