- The paper comprehensively surveys graph-based anomaly detection methods, emphasizing both anomaly identification and explanation.
- It categorizes techniques for static (plain and attributed) and dynamic graphs, detailing methods like egonet analysis, frequency-based substructure discovery, and matrix decomposition.
- The study highlights practical applications across telecom, security, and social networks while outlining challenges and future research directions.
Survey of Graph-based Anomaly Detection and Description
In the field of anomaly detection, identifying outliers within structured datasets represented as graphs has become increasingly vital. Traditional techniques have largely focused on multi-dimensional points in unstructured data. This paper presents a comprehensive survey addressing the state-of-the-art methods specifically designed for anomaly detection within graph data structures, emphasizing both detection and description of anomalies.
Overview and Categorization
The survey thoroughly categorizes these methods based on whether the graphs are static or dynamic, and whether they incorporate node and edge attributes. Static graphs are evaluated under scenarios involving both plain and attributed graphs, while dynamic graph analysis considers the evolution of graph structures over time. Furthermore, the paper underscores the importance of anomaly description—delineating methods that can provide explanatory frameworks or interactive querying functionalities to understand the root causes or relationships among anomalies.
Static Graph Anomalies
Plain Graphs
In static plain graphs, structural patterns formed the basis for anomaly detection. Methods such as OddBall leverage egonet-based features to identify deviations from globally observed patterns. Community-based techniques uncover 'bridge' nodes through clustering and modularity maximization, exemplifying the utility in discerning irregular nodes and edges.
Attributed Graphs
Attributed graphs, where nodes and/or edges carry additional information, require methods that integrate graph structure with this auxiliary data. Techniques like Subdue identify rare substructures by combining frequency and modification cost, providing an intuitive understanding of deviations within graph attributes. Other approaches, such as CODA, simultaneously detect communities and community outliers employing probabilistic modeling, underscoring the interplay between node attributes and structural integrity of the graph.
Dynamic Graph Anomalies
In analyzing dynamic graphs, the paper explores methods that track changes over graph snapshots, capturing time-evolving patterns.
Feature-based Events
Feature-centric methods such as NetSimile and DeltaCon compute summary vectors or similarity matrices over time to spot deviations in the graph's evolution. These approaches effectively balance simplicity and efficiency, ideal for detecting sudden temporal anomalies.
Decomposition-based Events
These methods, often utilizing matrix or tensor decompositions, identify significant changes in graph structure. Techniques like CMD and NrMF ensure interpretability by maintaining non-negativity in residual matrices, making the detection results more comprehensible.
Community-based Events
Community-centric techniques such as GraphScope and Bayesian models monitor community behavior, flagging anomalies when unexpected structural changes arise within these communities. This approach highlights the significance of closely-knit subgroups in larger networks and their role in anomalous activities.
Anomaly Description and Attribution
Effective anomaly detection goes beyond merely identifying anomalies—it requires explaining and interpreting these events. This paper emphasizes methods like Non-negative Residual Matrix Factorization (NrMF) which align detection results with intuitive interpretations. Additionally, interactive graph querying tools like CePS and Dot2Dot facilitate the exploration of connections among detected anomalies, aiding in determining potential root causes and enhancing the overall interpretability of anomaly detection processes.
Real-world Applications
Graph-based anomaly detection methods are particularly potent in real-world applications spanning diverse domains:
- Telecommunication Networks: Detection of subscription fraud by leveraging communities of interest.
- Auction Networks: Identification of fraudsters through relational classification models.
- Accounting Networks: Using relational models to detect risky accounts with transaction anomalies.
- Security Networks: Examining social and professional broker relationships for potential securities fraud.
- Opinion Networks: Graph-based trust propagation and relational classification to detect fake reviews and deceptive behaviors.
- Web Networks: Implementing TrustRank and its variants for battling web spam and malware.
- Social Networks: Techniques to combat socware through user interaction patterns and content analysis.
- Computer Networks: Monitoring communication patterns and community interactions for intrusion detection.
Future Prospects
The paper concludes with vital challenges that continue to shape the domain of graph-based anomaly detection. These include the need for methods that address attributed dynamic graphs, leveraging historical dynamic data, selecting appropriate time granularities for analysis, reinforcing adversarial robustness, and developing scalable, real-time detection systems. As future research bridges these gaps, the potential for more sophisticated and robust anomaly detection systems will significantly expand.
The survey presented by Akoglu, Tong, and Koutra broadly covers existing techniques while charting the course for future endeavors, demonstrating the nuanced and potent use of graph-based approaches in anomaly detection and description.