A Comprehensive Survey of Data Mining-based Fraud Detection Research (1009.6119v1)

Published 30 Sep 2010 in cs.AI and cs.CE

Abstract: This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.

Citations (445)

View on Semantic Scholar

Summary

The paper defines and categorizes various data mining methods for fraud detection, highlighting challenges in processing large-scale data.
It evaluates performance metrics like ROC and AUC, uncovering the limitations of complex models and the potential of simpler approaches.
It identifies future directions by integrating insights from related fields and advocating unsupervised and semi-supervised techniques.

Overview of Data Mining-Based Fraud Detection Research

The paper presents a detailed survey of data mining techniques applied to automated fraud detection, examining literature spanning the last decade. This comprehensive review categorizes and summarizes the state of research, offering insights into existing challenges and innovative methods. The focus is on understanding the dynamics of professional fraudsters, types of fraud, and the effectiveness of data mining techniques across various industries.

Key Objectives

The paper has two primary objectives:

Definition and Categorization: It aims to define the existing challenges in processing large datasets and streams for fraud detection. It categorizes and compares various data mining methods and techniques from scholarly and industrial research.
Exploration of Related Domains: Another objective is to highlight emerging directions from adjacent adversarial data mining fields. This includes areas like epidemic detection, insider trading, intrusion detection, and spam detection, proposing these as potential sources for enhancing fraud detection approaches.

Insights into Fraud Types and Industries

The paper explores the characteristics of fraudsters, categorizing them into internal, external, and organized criminals. It analyzes their impact across industries such as credit card, insurance, telecommunications, and e-commerce. Internal fraud is identified through financial reporting, while external fraud spans from credit application to transactional fraud, emphasizing the varied nature and impact of fraud across industries.

Data and Performance Metrics

A significant challenge in fraud detection research is the lack of publicly available real data, often compounded by skewed and imbalanced datasets. The paper discusses types of structured data typically employed in fraud detection studies and stresses the importance of specific attributes like financial ratios and transaction history. It highlights methodologies like Receiver Operating Characteristics and Area under the Curve as suitable metrics for assessing fraud detection systems.

Techniques and Methods

The survey identifies four main data mining approaches:

Supervised Learning: Methods like neural networks and decision trees are extensively used, but often criticized for complexity and resource-intensiveness. The research calls for simpler methods like logistic regression which may offer comparable results with reduced computational overhead.
Hybrid Approaches: Combining supervised and unsupervised methods, such as integrating neural networks with fuzzy logic, has shown potential in improving detection outcomes.
Semi-supervised Learning: Techniques that utilize only legal data to detect anomalies, offering a practical alternative when real fraud data is scarce.
Unsupervised Learning: Graph mining and link analysis, though underexplored, are suggested for their capacity to capture complex fraud patterns.

Criticisms and Future Directions

The paper critiques the over-reliance on sophisticated models while neglecting simpler, practical solutions that could be more effective in real-time, resource-constrained environments. It encourages leveraging advancements in related fields such as bio-terrorism surveillance and insider trading detection, advocating for methods that incorporate spatio-temporal data and game-theoretic approaches.

Conclusion

This survey emphasizes the need for a nuanced understanding of fraud detection, combining insights from related adversarial domains to advance methodologies. Highlighting gaps in current research, it proposes a roadmap for future developments, particularly in refining unsupervised and semi-supervised methods, and broadening the data sources used in fraud detection initiatives.

This paper serves as a foundation for researchers aiming to refine the technological and strategic approaches in the field of automated fraud detection, urging a cross-disciplinary exchange of problem-solving techniques.