Application of AI-based Models for Online Fraud Detection and Analysis (2409.19022v2)

Published 25 Sep 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Fraud is a prevalent offence that extends beyond financial loss, causing psychological and physical harm to victims. The advancements in online communication technologies alowed for online fraud to thrive in this vast network, with fraudsters increasingly using these channels for deception. With the progression of technologies like AI, there is a growing concern that fraud will scale up, using sophisticated methods, like deep-fakes in phishing campaigns, all generated by language generation models like ChatGPT. However, the application of AI in detecting and analyzing online fraud remains understudied. We conduct a Systematic Literature Review on AI and NLP techniques for online fraud detection. The review adhered the PRISMA-ScR protocol, with eligibility criteria including relevance to online fraud, use of text data, and AI methodologies. We screened 2,457 academic records, 350 met our eligibility criteria, and included 223. We report the state-of-the-art NLP techniques for analysing various online fraud categories; the training data sources; the NLP algorithms and models built; and the performance metrics employed for model evaluation. We find that current research on online fraud is divided into various scam activitiesand identify 16 different frauds that researchers focus on. This SLR enhances the academic understanding of AI-based detection methods for online fraud and offers insights for policymakers, law enforcement, and businesses on safeguarding against such activities. We conclude that focusing on specific scams lacks generalization, as multiple models are required for different fraud types. The evolving nature of scams limits the effectiveness of models trained on outdated data. We also identify issues in data limitations, training bias reporting, and selective presentation of metrics in model performance reporting, which can lead to potential biases in model evaluation.

Summary

The paper systematically reviews AI-based techniques for online fraud detection, curating 223 studies via PRISMA-ScR to map current methodologies.
The paper demonstrates that hybrid models, including Random Forest and transformer-based approaches like BERT, improve the detection of phishing and scams.
The paper identifies challenges such as data scarcity and model interpretability, recommending active learning and real-time data integration for enhanced performance.

Analysis of AI-based Models for Online Fraud Detection

The systematic review "Application of AI-based Models for Online Fraud Detection and Analysis" provides a critical evaluation of the existing research landscape, making substantial contributions to understanding the integration of AI for fraud detection. Predominantly, the review zeroes in on the role AI plays in the analysis and detection of online fraud using textual data. Presented by the authors Papasavva et al., this review offers granular insights into datasets, methodologies, and the limitations that researchers face in this domain.

The report begins by underpinning the pervasive impact of online fraud, noting its extensive emotional, psychological, and financial ramifications. Although advances in AI—particularly Generative Artificial Intelligence (GenAI)—highlight burgeoning concerns over the escalation in fraud sophistication, notably with deep-fakes in scams like phishing, the review identifies the relative scarcity of comprehensive studies on AI's role in fraud detection.

Research Methodology

In curating the systematic literature review, the researchers adhere to the PRISMA-ScR guidelines, enhancing their methodological rigor. The eligibility criteria were defined to focus on literature published between 2019 and 2024, capturing both formal publications and grey literature. The selection process yielded 223 studies from an initial pool of 2,457 academic records.

The research questions pinpointed four key areas:

The state-of-the-art AI techniques in fraud detection.
The data sources employed by researchers.
Evaluation processes for AI models.
Predominant fraud activities studied.

Key Findings

Types of Fraud and Data Sources

The review indicates that phishing attacks remain the most extensively studied area, particularly phishing URLs and emails, with a concentration on user-reported data and datasets from security agencies (e.g., PhishTank, Kaggle). Interestingly, recent literature indicates a shift towards using dynamic data sources, such as social media, which can offer real-time updates on evolving phishing tactics.

Methodologies in AI Implementation

AI and NLP methodologies widely utilized for fraud detection include supervised machine learning techniques—Random Forest (RF) being notably effective in several domains such as URL phishing. Hybrid approaches combining multiple techniques are emerging as robust options. The findings reveal a crucial need for high-dimensional semantic understanding via advanced transformer-based models like BERT.

Challenges and Recommendations

Common challenges highlighted involve data scarcity, especially with the evolving nature of fraud. Models trained on outdated datasets are limited in efficacy against newly devised scams. Performance metrics demand rigorous reporting and standardization, stressing accuracy, precision, F1 score, and AUC comprehensively.

For future research, the review recommends leveraging active learning models that incorporate real-time data streams. Emphasis on ethical deployment and model interpretability is crucial, particularly when models are intended for real-world applications.

Implications and Future Developments

Unveiling the current state of AI models in fraud detection, this review also speculates on the future trajectory of AI applications in cybersecurity. There is potential for models that adaptively learn from data, drawing capabilities from unsupervised learning paradigms to tap into fraud patterns, with advances in LLMs enriching these capabilities. Furthermore, the integration of user feedback and usability assessments into model deployment stages is encouraged to enrich real-world application and efficacy.

In conclusion, while substantial groundwork has been laid in elevating AI's role in detecting online fraud, this review underscores the continuous need for innovative, adaptable frameworks that can keep pace with the sophisticated evolution of digital deception.