Automated Behavioral Analysis of Malware A Case Study of WannaCry Ransomware (1709.08753v1)

Published 25 Sep 2017 in cs.CR

Abstract: Ransomware, a class of self-propagating malware that uses encryption to hold the victims' data ransom, has emerged in recent years as one of the most dangerous cyber threats, with widespread damage; e.g., zero-day ransomware WannaCry has caused world-wide catastrophe, from knocking U.K. National Health Service hospitals offline to shutting down a Honda Motor Company in Japan[1]. Our close collaboration with security operations of large enterprises reveals that defense against ransomware relies on tedious analysis from high-volume systems logs of the first few infections. Sandbox analysis of freshly captured malware is also commonplace in operation. We introduce a method to identify and rank the most discriminating ransomware features from a set of ambient (non-attack) system logs and at least one log stream containing both ambient and ransomware behavior. These ranked features reveal a set of malware actions that are produced automatically from system logs, and can help automate tedious manual analysis. We test our approach using WannaCry and two polymorphic samples by producing logs with Cuckoo Sandbox during both ambient, and ambient plus ransomware executions. Our goal is to extract the features of the malware from the logs with only knowledge that malware was present. We compare outputs with a detailed analysis of WannaCry allowing validation of the algorithm's feature extraction and provide analysis of the method's robustness to variations of input data\textemdash changing quality/quantity of ambient data and testing polymorphic ransomware. Most notably, our patterns are accurate and unwavering when generated from polymorphic WannaCry copies, on which 63 (of 63 tested) anti-virus (AV) products fail.

Citations (181)

View on Semantic Scholar

Summary

The paper introduces an automated method using TF-IDF to identify ransomware-specific behavioral features from system logs, contrasting malicious and non-malicious activities.
Validation with Cuckoo Sandbox logs demonstrated that the method successfully extracts key pre-encryption features of WannaCry, aligning with known malware behaviors.
The approach proved robust against high volumes of ambient data and polymorphic WannaCry variants, highlighting its potential for real-world intrusion detection systems.

Automated Behavioral Analysis of Malware: A Case Study of WannaCry Ransomware

The paper "Automated Behavioral Analysis of Malware" addresses the pressing challenge posed by ransomware, a sophisticated class of self-propagating malware. The focus is on the infamous WannaCry ransomware, renowned for its global impact on over 150 countries, debilitating sectors ranging from healthcare to automotive industries. This paper, authored by Qian Chen and Robert A. Bridges, introduces a method to automate the identification and ranking of ransomware-specific features from system logs, thereby advancing current manual analysis techniques.

To achieve this, the methodology employs Term Frequency-Inverse Document Frequency (TF-IDF), a statistical measure used in information retrieval, to isolate the distinguishing features of the malware from log data. Importantly, these logs include both ambient (non-malicious) and malicious activities. To validate this approach, the researchers utilized logs generated via the Cuckoo Sandbox, a dynamic malware analysis platform, during the execution of both WannaCry variants and simulated benign user behavior.

The paper presents several experiments to demonstrate the robustness of the proposed method. Firstly, WannaCry logs juxtaposed with non-malicious logs yield a set of 74 pre-encryption features. Among these, the highest-ranking features are consistent with known WannaCry behaviors, such as the creation and manipulation of specific files and registry keys. This validates the method's capability to discern accurate ransomware signatures before the harmful encryption phase.

A significant aspect of the research is its assessment of the method’s resilience to varying conditions, including the quality and quantity of ambient data and polymorphic ransomware samples. Notably, the method successfully extracted malware features from environments where a substantial portion of the logs were non-malicious, highlighting its potential for real-world application in operational settings where malware activity is interspersed within large volumes of benign data.

Furthermore, the approach demonstrated robustness against polymorphic variants of WannaCry, as all tested anti-virus software failed to detect these samples, showcasing TF-IDF’s superior capability in distinguishing syntactic variations of ransomware. The researchers underscore that non-malicious features can be falsely identified as indicative malware traits only if they are prevalent within the malicious document but infrequent across other documents—an insight integral to avoiding false positives.

The implications of this research are multifaceted. Theoretically, it contributes to the literature on automated malware feature extraction and pattern generation, offering a methodology that integrates well with dynamic analysis tools and potentially enhances intrusion detection systems. Practically, it provides a scalable solution to expedite the detection and analysis of ransomware, thereby fortifying cyber defenses against pervasive threats like WannaCry.

Future research could explore integrating this TF-IDF based extraction technique with other intrusion detection systems, potentially paving the way for advanced autonomic security systems capable of real-time threat assessment and response. This paper stands as a critical step forward in automating malware analysis, laying the groundwork for further exploration in the domain of cybersecurity.

Automated Behavioral Analysis of Malware A Case Study of WannaCry Ransomware (1709.08753v1)

Summary

Automated Behavioral Analysis of Malware: A Case Study of WannaCry Ransomware

Related Papers