Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection (2507.11997v2)

Published 16 Jul 2025 in cs.LG and cs.AI

Abstract: Graph fraud detection has garnered significant attention as Graph Neural Networks (GNNs) have proven effective in modeling complex relationships within multimodal data. However, existing graph fraud detection methods typically use preprocessed node embeddings and predefined graph structures to reveal fraudsters, which ignore the rich semantic cues contained in raw textual information. Although LLMs exhibit powerful capabilities in processing textual information, it remains a significant challenge to perform multimodal fusion of processed textual embeddings with graph structures. In this paper, we propose a \textbf{M}ulti-level \textbf{L}LM \textbf{E}nhanced Graph Fraud \textbf{D}etection framework called MLED. In MLED, we utilize LLMs to extract external knowledge from textual information to enhance graph fraud detection methods. To integrate LLMs with graph structure information and enhance the ability to distinguish fraudsters, we design a multi-level LLM enhanced framework including type-level enhancer and relation-level enhancer. One is to enhance the difference between the fraudsters and the benign entities, the other is to enhance the importance of the fraudsters in different relations. The experiments on four real-world datasets show that MLED achieves state-of-the-art performance in graph fraud detection as a generalized framework that can be applied to existing methods.

Summary

The paper introduces MLED, a framework that fuses LLM-driven textual embeddings with GNN-based graph representations for enhanced fraud detection.
It employs type-level and relation-level enhancers to dynamically integrate node and edge data, improving metrics like AUCROC and F1-score.
Experimental results show that MLED achieves significant performance gains with minimal computational overhead on diverse datasets.

Multi-Level LLM Enhanced Graph Fraud Detection

The paper "Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection" discusses a novel framework called MLED designed to improve graph fraud detection techniques by integrating the powerful capabilities of LLMs with Graph Neural Networks (GNNs). This framework aims to leverage textual embeddings and graph structures to enhance the ability to distinguish fraudsters from benign entities.

Framework Overview: MLED

MLED's core concept is the multimodal fusion of LLMs and GNNs to enhance graph fraud detection. The framework consists of two key components: a type-level enhancer and a relation-level enhancer. These components are responsible for integrating external knowledge from LLMs and refining the detection capabilities of existing graph structures.

Type-Level Enhancer: This component enhances the distinction between fraudsters and benign entities using textual node-type embeddings. LLMs generate summaries from node-type descriptions, and these summaries are transformed into semantic embeddings using embedding models. This component adjusts the importance of type-level features dynamically.
Relation-Level Enhancer: This component bolsters the importance of different relations using textual descriptions related to node connections. Similar to the type-level enhancer, LLMs generate semantic embeddings for relations. This component employs a relation-attentive unit for adaptive compression, significantly improving the framework's discriminative power.
Figure 1: Workflow comparison. The conventional graph fraud detection methods only utilize GNNs for model training. In contrast, MLED combines textual and graph structural information through LLMs and GNNs to enhance the ability to distinguish fraudsters.

Methodology

Preliminaries

The framework targets multimodal graphs where nodes can have complex relationships and multimodal attributes. A graph fraud detection problem is posed as an imbalanced binary classification task, focusing on detecting suspicious nodes within a network.

Feature Mapping and Enhancers

Using a feature mapping layer, MLED projects input nodes into a unified representation space. The subsequent enhancers use LLM-generated semantic embeddings to enrich node representations on multiple levels:

Type-Level Enhancer: Constructs representations by merging LLM-extracted node-type embeddings with GNN-derived embeddings. These are weighted through learnable parameters to reflect their significance in identifying fraudsters.
Relation-Level Enhancer: Utilizes embeddings sourced from LLM-processed relationship descriptions. These embeddings contribute to node scores through a relation-attentive mechanism, further refining fraud detection accuracy.
Figure 2: The overall framework of MLED.

Fraud Detection Process

The integrated node representations from both type and relation level are combined using weighted sums. This comprehensive node depiction is then classified to predict whether a node is fraudulent or benign, incorporating MLED into existing GNN-based fraud detection pipelines.

Experimental Results

MLED's efficacy is demonstrated through extensive experiments on both multi-relation and single-relation datasets, including Amazon and YelpChi, achieving remarkable improvements over the state-of-the-art:

On multi-relation datasets, MLED shows an increase in metrics such as AUCROC and F1-score, enhancing the ability to separate fraudsters from benign entities.
The framework's integration also carries negligible additional time overhead, shown by an average increase in training time of less than 0.3 seconds per epoch.
Visualization analyses and parameter studies reveal that MLED's multi-level enhancements lead to clearer cluster definitions and resilient performance against varying hyperparameter settings.
Figure 3: Ablation study results on various MLED variants with the SOTA method.

Figure 4: Efficiency analysis on the Amazon and YelpChi datasets.

Figure 5: The visualization analysis on the Amazon dataset, where red represents fraudsters blue represents benign entities. Each pair of subfigures consists of baselines (left) and baselines enhanced by MLED (right).

Conclusion

MLED introduces a pioneering approach to fraud detection by innovatively merging LLM-driven textual insights with structural graph data, significantly improving detection accuracy across diverse datasets. It showcases the potential of applying multimodal learning for complex real-world tasks, suggesting further exploration in dynamically integrating linguistic and structural data for other graph-based applications.