Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection (2306.12251v2)

Published 21 Jun 2023 in cs.LG

Abstract: With a long history of traditional Graph Anomaly Detection (GAD) algorithms and recently popular Graph Neural Networks (GNNs), it is still not clear (1) how they perform under a standard comprehensive setting, (2) whether GNNs can outperform traditional algorithms such as tree ensembles, and (3) how about their efficiency on large-scale graphs. In response, we introduce GADBench -- a benchmark tool dedicated to supervised anomalous node detection in static graphs. GADBench facilitates a detailed comparison across 29 distinct models on ten real-world GAD datasets, encompassing thousands to millions ($\sim$6M) nodes. Our main finding is that tree ensembles with simple neighborhood aggregation can outperform the latest GNNs tailored for the GAD task. We shed light on the current progress of GAD, setting a robust groundwork for subsequent investigations in this domain. GADBench is open-sourced at https://github.com/squareRoot3/GADBench.

Citations (35)

Summary

  • The paper introduces GADBench, a comprehensive tool that systematizes the evaluation of supervised graph anomaly detection models across diverse datasets.
  • Rigorous experiments reveal that tree ensemble methods often outperform specialized GNNs, challenging prevailing assumptions in the field.
  • The study emphasizes practical insights by highlighting improved accuracy, efficiency, and robustness against label imbalance in anomaly detection.

Overview of "GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection"

The paper, "GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection," introduces a comprehensive benchmarking tool, GADBench, specifically for supervised anomaly detection within graph data. This paper systematically evaluates and compares the efficacy of traditional graph anomaly detection (GAD) algorithms and modern Graph Neural Networks (GNNs), addressing several pertinent questions in the field: the comparative performance of GNNs versus traditional methods like tree ensembles, and their efficiency on large-scale graphs. The paper undertakes rigorous experimental evaluations, focusing on supervised anomalous node detection against a backdrop of varied algorithms and datasets.

Key Contributions

  1. Introduction of GADBench: GADBench serves as an extensive benchmarking platform for evaluating supervised anomaly detection methods on static attributed graphs. The benchmark encompasses a range of models and ten real-world datasets, facilitating nuanced performance comparisons across classical machine learning models, standard GNNs, and GNNs explicitly designed for anomaly detection.
  2. Empirical Insights: A notable finding from the benchmark is that tree ensemble approaches, including Random Forest with simple neighborhood aggregation, often outperform state-of-the-art GNNs tailored for GAD tasks. This revelation challenges the predominant assumption about the superiority of GNNs in this domain.
  3. Comprehensive Dataset Selection: The benchmark employs ten datasets representing diverse domains such as social media, e-commerce, and financial networks, ensuring a broad evaluation environment. These datasets range in size from thousands to millions of nodes and include both homogeneous graphs and those with a degree of heterogeneity.
  4. Model Scope: The evaluation incorporates 29 models, including 7 non-graph models, 10 standard GNNs, and 10 specialized GNNs designed for GAD. Additionally, the paper explores two tree ensemble models with neighborhood aggregation, underscoring their effectiveness.
  5. Rigorous Evaluation Protocol: The benchmark utilizes metrics such as AUROC, AUPRC, and Recall at top-k predictions. Evaluation scenarios span fully-supervised and semi-supervised settings, with hyperparameter optimization further refining insights into model performance.

Findings and Implications

The experimental evaluations yield several noteworthy insights:

  • Tree Ensembles' Superior Performance: Tree ensemble methods demonstrate superior AUPRC and Recall@K performance compared to many GNN variants, particularly in fully-supervised settings. The results highlight tree ensembles’ resilience to issues like label imbalance and graph heterophily.
  • Specialized GNNs Versus Standard GNNs: While specialized GNNs generally outperform standard GNNs in anomaly detection tasks, their reliance on fine-tuned hyperparameters poses challenges for practical deployment. In contrast, tree ensembles offer robust performance with fewer parameter adjustments.
  • Efficiency and Resource Use: Tree ensemble models not only present higher accuracy but also exhibit reduced training time and memory usage compared to more complex GNN architectures, highlighting a significant efficiency advantage.

Future Directions

The research outlines several avenues for future inquiry and extension:

  • Expanded Benchmarking for Diverse Scenarios: Future iterations of GADBench could incorporate dynamic graphs, assess more complex graph structures, and evaluate online and hierarchical anomaly detection strategies.
  • Enhanced Model Integration: The inclusion of cutting-edge algorithms and exploration of automated model selection processes could further refine GADBench's utility.
  • Application-Driven Research: Further exploration of real-world applications and domain-specific adaptations of GNNs could unlock additional performance improvements, particularly in handling nuanced graph structures inherent in application-specific data.

In conclusion, the introduction of GADBench as a benchmarking tool presents a pivotal step toward advancing robustness and comparability in graph anomaly detection research, emphasizing the importance of comprehensive evaluations across a spectrum of techniques. The findings advocate for a balanced consideration of traditional and modern approaches, urging the research community to re-evaluate assumptions regarding the capabilities of machine learning constructs in GAD tasks.