Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs (1602.04844v2)

Published 15 Feb 2016 in cs.SI

Abstract: Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. We propose StreamSpot, a clustering based anomaly detection approach that addresses challenges in two key fronts: (1) heterogeneity, and (2) streaming nature. We introduce a new similarity function for heterogeneous graphs that compares two graphs based on their relative frequency of local substructures, represented as short strings. This function lends itself to a vector representation of a graph, which is (a) fast to compute, and (b) amenable to a sketched version with bounded size that preserves similarity. StreamSpot exhibits desirable properties that a streaming application requires---it is (i) fully-streaming; processing the stream one edge at a time as it arrives, (ii) memory-efficient; requiring constant space for the sketches and the clustering, (iii) fast; taking constant time to update the graph sketches and the cluster summaries that can process over 100K edges per second, and (iv) online; scoring and flagging anomalies in real time. Experiments on datasets containing simulated system-call flow graphs from normal browser activity and various attack scenarios (ground truth) show that our proposed StreamSpot is high-performance; achieving above 95% detection accuracy with small delay, as well as competitive time and memory usage.

Citations (222)

View on Semantic Scholar

Summary

The paper introduces a novel shingle-based similarity measure that enhances anomaly detection in streaming heterogeneous graphs.
It employs compact sketches with constant space complexity to process over 100,000 edges per second in real time.
Experimental results show over 95% detection accuracy, demonstrating its practical efficacy in cybersecurity applications.

Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs

The paper "Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs" introduces an anomaly detection approach tailored for streaming heterogeneous graphs, specifically targeting the requirements set by applications like advanced persistent threat (APT) detection in cybersecurity contexts. Such graphs, characterized by typed nodes and edges arriving in a streaming manner, pose unique challenges that this paper addresses through an innovative clustering-based methodology.

Key Contributions and Methods

The authors describe a new similarity measure for heterogeneous graphs that revolves around the relative frequency of local substructures, termed "shingles." This methodology captures temporal and structural properties by examining $k$ -hop neighborhoods using a novel shingle-frequency vector representation. Beyond the mere conceptual introduction, the authors propose a practical implementation in the form of sketches—compact representations that retain essential similarity information while consuming bounded memory, a crucial requirement for real-time applications with resource constraints.

Key desirable features of the proposed method include:

Fully-streaming capability: The system processes the graph stream one edge at a time.
Memory efficiency: It maintains constant space complexity for graph sketches and clustering.
Processing speed: The framework can process over 100,000 edges per second, suited for high throughput streaming contexts.
Real-time anomaly detection: The framework flags anomalies with minimal delay following edge arrival.

Experimental Evaluation and Results

Through experiments conducted on datasets simulating system-call flow from normal web activity and emulated attacks, the approach exhibited superior detection accuracy (>95%) with competitive timing and memory usage. The robustness of the method against variations in parameters such as sketch size and memory limitations was also demonstrated, maintaining operational efficiency even under constraints.

Implications and Future Directions

Practically, this approach provides a scalable solution to detect anomalies in environments where data naturally streams in as timestamped events, such as cybersecurity, communications networks, or even social media analytics. Theoretically, the introduction of sketches tailored for graph similarity in streaming heterogeneous scenarios contributes to graph analysis's broader toolkit, presenting avenues for further optimization or adaptation to linked problem domains.

Potential future research directions might involve adaptation to handle more complex graph attributes or extending the approach to interact with other machine learning paradigms for enhanced interpretability or predictive maintenance tasks. Additionally, examining the integration of this methodology within distributed or cloud computing environments could further enhance its applicability.

In conclusion, this paper provides a methodologically sound and practical approach to real-time anomaly detection in streaming heterogeneous graphs, meeting stringent requirements of speed, memory efficiency, and detection accuracy, with ample room for future exploration and application in various domains.

PDF Markdown

Related Papers

Real-Time Anomaly Detection in Edge Streams (2020)
MSTREAM: Fast Anomaly Detection in Multi-Aspect Streams (2020)
Streaming Anomaly Detection (2023)
Sketch-Based Anomaly Detection in Streaming Graphs (2021)
A Streaming Algorithm for Graph Clustering (2017)