Adversarial Deep Learning for Robust Detection of Binary Encoded Malware (1801.02950v3)

Published 9 Jan 2018 in cs.CR, cs.LG, and stat.ML

Abstract: Malware is constantly adapting in order to avoid detection. Model based malware detectors, such as SVM and neural networks, are vulnerable to so-called adversarial examples which are modest changes to detectable malware that allows the resulting malware to evade detection. Continuous-valued methods that are robust to adversarial examples of images have been developed using saddle-point optimization formulations. We are inspired by them to develop similar methods for the discrete, e.g. binary, domain which characterizes the features of malware. A specific extra challenge of malware is that the adversarial examples must be generated in a way that preserves their malicious functionality. We introduce methods capable of generating functionally preserved adversarial malware examples in the binary domain. Using the saddle-point formulation, we incorporate the adversarial examples into the training of models that are robust to them. We evaluate the effectiveness of the methods and others in the literature on a set of Portable Execution~(PE) files. Comparison prompts our introduction of an online measure computed during training to assess general expectation of robustness.

Citations (181)

View on Semantic Scholar

Summary

The paper adapts adversarial deep learning from continuous domains to binary features for robust malware detection against adversarial examples.
It introduces the SLEIPNIR framework, using saddle-point optimization to train detectors resilient to binary adversarial malware.
Evaluations show that adversarial training, particularly with the rFGSM omenclature{k} method, yields models with high accuracy and significantly enhanced resilience against various evasion techniques.

Adversarial Deep Learning for Robust Detection of Binary Encoded Malware

This paper presents a novel approach to enhance malware detection using adversarial deep learning techniques tailored for applications in cybersecurity, specifically in the binary domain. The authors address an emergent challenge faced by malware detectors: their vulnerability to adversarial examples (AEs). These are deliberately perturbed versions of malware that maintain functionality while evading detection. Traditional adversarial methods, popular in image classification, are extended into the discrete domain of binary-encoded features leveraged in malware detection.

Key Contributions

The paper makes several distinct contributions to the field of adversarial machine learning and cybersecurity:

Generating Binary Adversarial Examples: The authors propose four methods for generating adversarial malware examples in a binary-encoded feature space. This allows for perturbations that evade detection while preserving malicious functionality.
SLEIPNIR Framework: The paper introduces the SLEIPNIR framework, which integrates saddle-point optimization for training detectors robust against adversarial malware examples. This is akin to methods used in continuous domains, adapted to handle discrete feature spaces.
Evaluation of Portable Executables: Extensive evaluations on a set of Portable Executables (PEs) reveal the efficacy of incorporating randomization in adversarial generation methods.
Robustness Measure: A novel online measure is introduced to assess the robustness of models during training, offering insights into the expectation of model robustness against adversaries.

Numerical Results and Practical Implications

The trained models showcase competitive classification accuracy similar to non-adversarially trained models while better handling evasion attempts by adversarial malware. Notably, the model trained with the rFGSMₖ method demonstrates a higher resilience against multiple adversarial attack techniques, a key finding highlighted by its lower evasion rates compared to other methods.

The implication of this research is substantial for practical cybersecurity applications where malware evolves rapidly to avoid detection. Integrating adversarial robustness into malware detection systems can significantly enhance protection mechanisms, predicting and counteracting potential future evasion strategies employed by malware developers.

Theoretical Impact and Future Directions

From a theoretical standpoint, the paper extends adversarial learning paradigms into the binary domain, presenting a foundation for subsequent advancements in adversarial training methods across different feature spaces. The developed framework exhibits versatility and can be adapted for diverse machine learning models and datasets.

Future research could delve into the nuances of loss landscapes in malware adversarial variants and investigate initialization strategies for inner maximizers, potentially reducing false positive and negative rates. Another avenue could explore the positioning of adversarial examples in relation to benign samples, optimizing detection boundaries more effectively.

Conclusion

This work marks a significant contribution by adapting adversarial deep learning methodologies from continuous to discrete domains. Through rigorous experimentation and the introduction of novel metrics, the authors facilitate advancements in robust malware detection applicable in real-world scenarios. The insights generated pave the way for future exploration into resilient machine learning models capable of thwarting increasingly sophisticated adversarial tactics.

Related Papers

Tweets

https://twitter.com/jkumarsharma998/status/1835446486118584583