Graph-based Anomaly Detection and Description: A Survey (1404.4679v2)

Published 18 Apr 2014 in cs.SI and cs.CR

Abstract: Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured {\em graph} data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we provide a comprehensive exploration of both data mining and machine learning algorithms for these {\em detection} tasks. we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly {\em attribution} and highlight the major techniques that facilitate digging out the root cause, or the `why', of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.

Citations (1,329)

View on Semantic Scholar

Summary

The paper comprehensively surveys graph-based anomaly detection methods, emphasizing both anomaly identification and explanation.
It categorizes techniques for static (plain and attributed) and dynamic graphs, detailing methods like egonet analysis, frequency-based substructure discovery, and matrix decomposition.
The study highlights practical applications across telecom, security, and social networks while outlining challenges and future research directions.

Survey of Graph-based Anomaly Detection and Description

In the field of anomaly detection, identifying outliers within structured datasets represented as graphs has become increasingly vital. Traditional techniques have largely focused on multi-dimensional points in unstructured data. This paper presents a comprehensive survey addressing the state-of-the-art methods specifically designed for anomaly detection within graph data structures, emphasizing both detection and description of anomalies.

Overview and Categorization

The survey thoroughly categorizes these methods based on whether the graphs are static or dynamic, and whether they incorporate node and edge attributes. Static graphs are evaluated under scenarios involving both plain and attributed graphs, while dynamic graph analysis considers the evolution of graph structures over time. Furthermore, the paper underscores the importance of anomaly description—delineating methods that can provide explanatory frameworks or interactive querying functionalities to understand the root causes or relationships among anomalies.

Static Graph Anomalies

Plain Graphs

In static plain graphs, structural patterns formed the basis for anomaly detection. Methods such as OddBall leverage egonet-based features to identify deviations from globally observed patterns. Community-based techniques uncover 'bridge' nodes through clustering and modularity maximization, exemplifying the utility in discerning irregular nodes and edges.

Attributed Graphs

Attributed graphs, where nodes and/or edges carry additional information, require methods that integrate graph structure with this auxiliary data. Techniques like Subdue identify rare substructures by combining frequency and modification cost, providing an intuitive understanding of deviations within graph attributes. Other approaches, such as CODA, simultaneously detect communities and community outliers employing probabilistic modeling, underscoring the interplay between node attributes and structural integrity of the graph.

Dynamic Graph Anomalies

In analyzing dynamic graphs, the paper explores methods that track changes over graph snapshots, capturing time-evolving patterns.

Feature-based Events

Feature-centric methods such as NetSimile and DeltaCon compute summary vectors or similarity matrices over time to spot deviations in the graph's evolution. These approaches effectively balance simplicity and efficiency, ideal for detecting sudden temporal anomalies.

Decomposition-based Events

These methods, often utilizing matrix or tensor decompositions, identify significant changes in graph structure. Techniques like CMD and NrMF ensure interpretability by maintaining non-negativity in residual matrices, making the detection results more comprehensible.

Community-based Events

Community-centric techniques such as GraphScope and Bayesian models monitor community behavior, flagging anomalies when unexpected structural changes arise within these communities. This approach highlights the significance of closely-knit subgroups in larger networks and their role in anomalous activities.

Anomaly Description and Attribution

Effective anomaly detection goes beyond merely identifying anomalies—it requires explaining and interpreting these events. This paper emphasizes methods like Non-negative Residual Matrix Factorization (NrMF) which align detection results with intuitive interpretations. Additionally, interactive graph querying tools like CePS and Dot2Dot facilitate the exploration of connections among detected anomalies, aiding in determining potential root causes and enhancing the overall interpretability of anomaly detection processes.

Real-world Applications

Graph-based anomaly detection methods are particularly potent in real-world applications spanning diverse domains:

Telecommunication Networks: Detection of subscription fraud by leveraging communities of interest.
Auction Networks: Identification of fraudsters through relational classification models.
Accounting Networks: Using relational models to detect risky accounts with transaction anomalies.
Security Networks: Examining social and professional broker relationships for potential securities fraud.
Opinion Networks: Graph-based trust propagation and relational classification to detect fake reviews and deceptive behaviors.
Web Networks: Implementing TrustRank and its variants for battling web spam and malware.
Social Networks: Techniques to combat socware through user interaction patterns and content analysis.
Computer Networks: Monitoring communication patterns and community interactions for intrusion detection.

Future Prospects

The paper concludes with vital challenges that continue to shape the domain of graph-based anomaly detection. These include the need for methods that address attributed dynamic graphs, leveraging historical dynamic data, selecting appropriate time granularities for analysis, reinforcing adversarial robustness, and developing scalable, real-time detection systems. As future research bridges these gaps, the potential for more sophisticated and robust anomaly detection systems will significantly expand.

The survey presented by Akoglu, Tong, and Koutra broadly covers existing techniques while charting the course for future endeavors, demonstrating the nuanced and potent use of graph-based approaches in anomaly detection and description.

PDF Markdown