Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data (1411.5005v2)

Published 18 Nov 2014 in cs.CR

Abstract: Recent years have seen the rise of more sophisticated attacks including advanced persistent threats (APTs) which pose severe risks to organizations and governments by targeting confidential proprietary information. Additionally, new malware strains are appearing at a higher rate than ever before. Since many of these malware are designed to evade existing security products, traditional defenses deployed by most enterprises today, e.g., anti-virus, firewalls, intrusion detection systems, often fail at detecting infections at an early stage. We address the problem of detecting early-stage infection in an enterprise setting by proposing a new framework based on belief propagation inspired from graph theory. Belief propagation can be used either with "seeds" of compromised hosts or malicious domains (provided by the enterprise security operation center -- SOC) or without any seeds. In the latter case we develop a detector of C&C communication particularly tailored to enterprises which can detect a stealthy compromise of only a single host communicating with the C&C server. We demonstrate that our techniques perform well on detecting enterprise infections. We achieve high accuracy with low false detection and false negative rates on two months of anonymized DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of real-world web proxy logs collected at the border of a large enterprise. Through careful manual investigation in collaboration with the enterprise SOC, we show that our techniques identified hundreds of malicious domains overlooked by state-of-the-art security products.

Authors (5)

Alina Oprea (56 papers)
Zhou Li (50 papers)
Ting-Fang Yen (1 paper)
Sang Chin (8 papers)
Sumayah Alrwais (1 paper)

Citations (166)

View on Semantic Scholar

Summary

The paper presents a belief propagation framework using graph theory on network logs to identify compromised hosts and malicious domains for early infection detection.
It efficiently detects covert command-and-control communication by analyzing the timing regularity of host-domain interactions.
Validated against real-world datasets, the framework achieved high detection rates (e.g., 98.33% on LANL data) and identified previously undetected malicious domains.

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

The paper presents a method for detecting early-stage infections in enterprise networks, focusing on advanced persistent threats (APTs) and sophisticated malware variants. The authors propose a framework leveraging graph-theoretic approaches, specifically belief propagation, to identify compromised hosts and malicious domains. This framework is capable of operating with and without initial "seeds" provided by enterprises, offering adaptable solutions for varying levels of available intelligence about potential threats.

Core Contributions

Belief Propagation Framework: The framework is designed to detect suspicious network activity by modeling network communication as a bipartite graph of hosts and domains. It systematically explores connections between entities to propagate suspicion scores, thereby identifying malicious actors based on anomalies and deviations from normal patterns.
C\ Communication Detection: The presented method efficiently discerns automated, periodic communication patterns characteristic of command-and-control (C) activities. By analyzing timing regularity in host-domain communication, it distinguishes between benign and potentially malicious interactions.
Empirical Validation: The framework's efficacy is demonstrated over anonymized datasets, including DNS logs from the Los Alamos National Lab (LANL) and extensive web proxy logs from a large enterprise. The results indicate high true detection rates with minimal false positives, signifying the method's robustness in real-world applications.

Results and Implications

The framework's performance was validated against LANL's dataset containing simulated APT campaigns, achieving a true detection rate of 98.33% with low false detection and negative rates. It identified hundreds of malicious domains from a large enterprise dataset that were initially undetected by prevailing security systems, showcasing its capability in real-time threat detection.

These results provide compelling evidence of the framework's potential to enhance early-stage detection of covert malware infections in enterprise settings. The adaptability of the belief propagation approach, along with its ability to integrate incomplete intelligence, makes it a valuable tool for security operations centers (SOCs) striving to mitigate the risks posed by APTs and sophisticated malware.

Future Developments

Future improvements could include refining the model to incorporate a wider range of network behavior metrics and adapting to evolving threat landscapes, including more sophisticated adversary techniques. Moreover, integration with broader security metrics and tools could offer a comprehensive defense strategy, further empowering enterprises to preempt and respond to emerging cyber threats effectively.