Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection (2306.00006v5)

Published 29 May 2023 in cs.SI, cs.AI, and cs.LG

Abstract: We reveal a one-class homophily phenomenon, which is one prevalent property we find empirically in real-world graph anomaly detection (GAD) datasets, i.e., normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker than normal nodes. However, this anomaly-discriminative property is ignored by existing GAD methods that are typically built using a conventional anomaly detection objective, such as data reconstruction. In this work, we explore this property to introduce a novel unsupervised anomaly scoring measure for GAD, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors, with the affinity defined as similarity on node attributes/representations. We further propose Truncated Affinity Maximization (TAM) that learns tailored node representations for our anomaly measure by maximizing the local affinity of nodes to their neighbors. Optimizing on the original graph structure can be biased by nonhomophily edges (i.e., edges connecting normal and abnormal nodes). Thus, TAM is instead optimized on truncated graphs where non-homophily edges are removed iteratively to mitigate this bias. The learned representations result in significantly stronger local affinity for normal nodes than abnormal nodes. Extensive empirical results on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. Our code is available at https://github.com/mala-lab/TAM-master/.

References (66)

Citations (13)

View on Semantic Scholar

Summary

The paper introduces Truncated Affinity Maximization (TAM), a method that leverages one-class homophily to enhance unsupervised graph anomaly detection.
It combines Local Affinity Maximization Networks with Normal Structure-preserved Graph Truncation to refine node representations by removing non-homophily edges.
TAM shows over 10% improvement in AUROC/AUPRC on multiple datasets, offering robust applications in fraud detection and cybersecurity.

Overview of Truncated Affinity Maximization for Graph Anomaly Detection

Graph anomaly detection (GAD) is a critical task in many real-world applications, ranging from fraud detection to identifying spurious network activities. Despite substantial progress in GAD, existing graph neural network (GNN)-based approaches often overlook intrinsic anomaly-discriminative properties, such as the one-class homophily. This paper introduces a novel method, Truncated Affinity Maximization (TAM), leveraging this property for improved unsupervised anomaly detection in graphs.

One-Class Homophily

The authors reveal one-class homophily as an empirically prevalent property in real-world GAD datasets: normal nodes typically exhibit stronger connections and affinities with each other compared to abnormal nodes. This foundational observation informs the development of a new unsupervised anomaly measure termed "local node affinity," which assigns higher anomaly scores to nodes that demonstrate weaker connectivity with their neighbors. This affinity is defined as the similarity of node attributes or representations, effectively enabling a new perspective for anomaly detection beyond conventional objectives like data reconstruction.

Truncated Affinity Maximization

The core innovation of TAM lies in its approach to learning node representations that inherently exploit local node affinity. The method is composed of two primary components: Local Affinity Maximization Networks (LAMNet) and Normal Structure-preserved Graph Truncation (NSGT). LAMNet aims to extract node representations by maximizing affinity towards neighbors, whereas NSGT iteratively removes non-homophily edges that connect dissimilar nodes, thereby mitigating biases in the optimization process.

Involvement of the truncated graph structure in LAMNet helps in refining the representations by focusing on graph areas with stronger homophily, thereby achieving a more accurate representation of node affinities. The extensive evaluations demonstrate that TAM significantly outperforms seven other state-of-the-art models on ten real-world GAD datasets, exhibiting notable improvements of over 10% in AUROC/AUPRC on certain challenging datasets.

Implications and Future Directions

This research presents a significant contribution to the literature on graph-based anomaly detection, introducing a method that effectively utilizes the overlooked anomaly-discriminative property—one-class homophily. Practically, TAM can be extended to various domains where graph data is prevalent and where understanding node-anomaly correlations is critical, such as in social networks, financial fraud detection, and cybersecurity.

Theoretically, future work can explore enhancing TAM by integrating it with heterophily-aware mechanisms, addressing potential performance drop in graphs with strong heterophily relations within normal nodes. The approach's reliance on affinity maximization could serve as a versatile framework for adapting to different types of anomaly detection tasks beyond graphs.

In terms of scalability, while TAM demonstrates competitive performance on large-scale datasets, potential adaptations could focus on improving computational efficiency for extremely large graphs, perhaps by leveraging distributed or parallel computing methodologies. Furthermore, exploring variant architectures of GNNs within the TAM framework could reveal additional performance insights, potentially incorporating more advanced architecture like graph attention networks (GAT) for nuanced affinity mapping.

In conclusion, this paper makes a crucial step forward in graph anomaly detection by refining the methodological approach to anomaly scoring, tapping into attribute-based affinity maximization as a robust detection measure. It empowers researchers to consider node-level relational dynamics more deeply, setting a precedent for future innovations in the field of unsupervised anomaly detection.

GitHub

YouTube

Show All Videos