Deciphering Malware's use of TLS (without Decryption) (1607.01639v1)

Published 6 Jul 2016 in cs.CR

Abstract: The use of TLS by malware poses new challenges to network threat detection because traditional pattern-matching techniques can no longer be applied to its messages. However, TLS also introduces a complex set of observable data features that allow many inferences to be made about both the client and the server. We show that these features can be used to detect and understand malware communication, while at the same time preserving the privacy of benign uses of encryption. These data features also allow for accurate malware family attribution of network communication, even when restricted to a single, encrypted flow. To demonstrate this, we performed a detailed study of how TLS is used by malware and enterprise applications. We provide a general analysis on millions of TLS encrypted flows, and a targeted study on 18 malware families composed of thousands of unique malware samples and ten-of-thousands of malicious TLS flows. Importantly, we identify and accommodate the bias introduced by the use of a malware sandbox. The performance of a malware classifier is correlated with a malware family's use of TLS, i.e., malware families that actively evolve their use of cryptography are more difficult to classify. We conclude that malware's usage of TLS is distinct from benign usage in an enterprise setting, and that these differences can be effectively used in rules and machine learning classifiers.

Citations (170)

View on Semantic Scholar

Summary

The paper demonstrates that malware utilizes TLS with discernible patterns based on observable parameters, allowing detection without decryption to overcome DPI limitations.
The research analyzed millions of TLS flows across 18 malware families, finding malware often uses older cryptography and exhibits distinct feature differences from enterprise traffic.
Classifiers using only observable TLS features achieved over 90% accuracy in detecting and attributing malware families like Virlock and Dridex in encrypted traffic.

Analysis of Malware Utilization of TLS Without Decryption

The paper "Deciphering Malware's use of TLS (without Decryption)" presents a systematic paper regarding the challenge emerging from malware’s adoption of Transport Layer Security (TLS) to obfuscate its communication, rendering traditional threat detection techniques, such as deep packet inspection (DPI), inadequate. The research exploits the data features available in TLS's observable parameters to infer characteristics of both client and server communication, demonstrating that these parameters can effectively segregate malicious from benign network traffic without violating encryption's privacy guarantees.

The research undertakes a comprehensive examination of millions of TLS-encrypted flows, thereby assessing malware's distinct use of TLS in comparison to enterprise applications. The paper evaluates 18 malware families using thousands of unique samples and tens of thousands of malicious flows, effectively balancing between general observations and detailed per-family analyses. Notably, the paper also accounts for bias due to malware sandboxing by refining feature extraction methodologies.

The essential conclusion drawn from the analysis is that malware communications via TLS exhibit discernible deviations from those in enterprise environments. Malware typically employs older and less secure cryptographic techniques compared to contemporary enterprise-level uses, which often leverage up-to-date and stronger cryptographic standards. A significant highlight of the paper is the attribute of malware family-specific traits being detectible based on observable TLS features alone, allowing classifiers to achieve high accuracy in distinguishing and attributing malware family classifications based on encrypted flows.

From an operational perspective, the paper demonstrates that classifiers based solely on observable TLS parameters can identify malware with 90.3% accuracy per flow, improving to 93.2% accuracy when considering all flows within a five-minute window. Some malware families, like Virlock and Dridex, are highlighted due to their unique TLS behavioral patterns, which facilitates accurate family attribution using TLS data features. Specifically, these findings reveal potential pathways for enhancing machine learning classifiers in network security systems, showing promising applicability in environments where large-scale encryption is prevalent.

The authors address the limitations of the current methodologies, such as potential biases introduced by using Windows XP-based malware sandboxes and the inability to identify malware families not represented in the analysis dataset. They emphasize further refinement in capturing diverse network behaviors over different operating systems and environments to mitigate such biases. Additionally, the paper delineates avenues for future research, including enhanced feature engineering to correlate unencrypted TLS handshake features with malware behavioral patterns more robustly and effectively.

In consideration of potential future developments, the growing encryption of network traffic denotes standard detection methodologies must evolve to maintain cyber defense efficacy. This paper underscores the necessity of advanced machine learning and rule-based systems utilizing encrypted flow data to enhance security infrastructures against the sophisticated use of encryption by malware. Continued integration of domain-specific features and adaptive classification models is imperative for advancing detection accuracy and operational feasibility within varying network ecosystems.

In conclusion, this paper clearly outlines a robust methodology for identifying malware using distinguishable TLS features without decrypting the content, rendering a novel investigative approach fundamental for augmenting modern cybersecurity frameworks. Through this research, the ability to maintain secure networks while respecting the privacy intended by encryption protocols is significantly advanced.

Related Papers

YouTube

Show All Videos