Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

Published 19 Aug 2020 in cs.SI, cs.CR, and cs.LG | (2008.08692v1)

Abstract: Graph Neural Networks (GNNs) have been widely applied to fraud detection problems in recent years, revealing the suspiciousness of nodes by aggregating their neighborhood information via different relations. However, few prior works have noticed the camouflage behavior of fraudsters, which could hamper the performance of GNN-based fraud detectors during the aggregation process. In this paper, we introduce two types of camouflages based on recent empirical studies, i.e., the feature camouflage and the relation camouflage. Existing GNNs have not addressed these two camouflages, which results in their poor performance in fraud detection problems. Alternatively, we propose a new model named CAmouflage-REsistant GNN (CARE-GNN), to enhance the GNN aggregation process with three unique modules against camouflages. Concretely, we first devise a label-aware similarity measure to find informative neighboring nodes. Then, we leverage reinforcement learning (RL) to find the optimal amounts of neighbors to be selected. Finally, the selected neighbors across different relations are aggregated together. Comprehensive experiments on two real-world fraud datasets demonstrate the effectiveness of the RL algorithm. The proposed CARE-GNN also outperforms state-of-the-art GNNs and GNN-based fraud detectors. We integrate all GNN-based fraud detectors as an opensource toolbox: https://github.com/safe-graph/DGFraud. The CARE-GNN code and datasets are available at https://github.com/YingtongDou/CARE-GNN.

Abstract PDF Upgrade to Chat

Citations (378)

View on Semantic Scholar

Summary

The paper’s main contribution is CARE-GNN, which integrates label-aware similarity measures and reinforcement learning to identify camouflaged fraudulent nodes.
The methodology employs a dynamic neighbor selection process and relation-specific aggregation to overcome traditional GNN limitations in noisy graph environments.
Experimental results on Yelp and Amazon datasets show significant improvements in AUC and recall, validating the framework’s practical and theoretical impact.

Enhancing Graph Neural Network-Based Fraud Detectors Against Camouflaged Fraudsters

The paper "Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters" presents an in-depth exploration of fraud detection using Graph Neural Networks (GNNs) in the context of camouflaged fraudulent activities. The authors introduce a novel approach that seeks to combat two main types of camouflage: feature camouflage and relation camouflage, which adversely affect the performance of GNNs.

Problem Context and Challenges

Fraud detection is critical in various domains, including financial transactions and social networks. Fraudsters often employ camouflage strategies to resemble legitimate users, complicating the detection process. Feature camouflage involves the manipulation of node features, while relation camouflage pertains to the deceptive establishment of connections with legitimate nodes, thus disguising suspicious behavior within the graph structure.

Traditional GNN-based approaches aggregate neighborhood information to uncover node suspiciousness. However, these methods struggle with effectiveness when confronting camouflaged nodes, as the aggregation step can inadvertently consolidate the misguided information introduced by the camouflage.

Methodology: CARE-GNN

The authors articulate a comprehensive solution named CARE-GNN, which incorporates multiple components to address the deficiencies of traditional GNNs:

Label-Aware Similarity Measure: This component implements a one-layer MLP to assess node similarity based on domain knowledge, as opposed to traditional unsupervised measures. This supervised measure is optimized using annotated data, thus improving the accuracy in identifying deceptive nodes.
Similarity-Aware Neighbor Selector: Utilizing reinforcement learning (RL), this module determines the optimal number of neighbors to aggregate, adaptive to the distinct characteristics of each relation type. This dynamic strategy helps in discerning genuine nodes from camouflaged ones.
Relation-Aware Neighbor Aggregator: CARE-GNN employs relation-specific aggregation of selected neighbors, effectively using the RL-determined thresholds to weight contributions from each relation. This strategy circumvents the need for additional computationally expensive mechanisms, such as attention layers.

Experimental Results

Extensive experiments on real-world datasets—Yelp and Amazon review data—demonstrate the advantages of CARE-GNN. Key results include significant improvements over state-of-the-art GNN-based detectors, with CARE-GNN achieving higher AUC and recall across multiple training settings. Particularly noteworthy is its performance gain in datasets characterized by dense and noisy graphs, where traditional GNNs often falter.

The results validate the hypothesis that targeted neighbor selection and informed similarity measures can substantially mitigate the negative impact of camouflaged fraudulent activities. Moreover, the adaptability of the RL framework allows CARE-GNN to efficiently optimize relations in the graph during training.

Implications and Future Directions

The implications of CARE-GNN are substantial for both academic research and practical applications in fraud detection. The methodology provides a robust framework that could be adapted to dynamic environments where fraudsters continually evolve their strategies.

Future developments could explore extending CARE-GNN to other domains where similar camouflage tactics are prevalent. The integration of more sophisticated semantic analysis techniques could further enhance the detection of deceptive features and relations. Additionally, exploring unsupervised or semi-supervised variants of CARE-GNN could reduce dependence on extensive annotated datasets, broadening its applicability.

In conclusion, CARE-GNN represents a significant advancement in GNN-based fraud detection, offering a practical and theoretically sound framework for combating sophisticated fraudulent behavior in complex graph-structured data.

Markdown