- The paper's main contribution is integrating social network analytics with a bipartite network and the BiRank algorithm to assign fraud scores to insurance claims.
- Network features, including fraud score and neighborhood attributes, are combined with intrinsic claim data to significantly enhance model performance.
- Experimental results show improvements in AUROC, AUPR, and top decile lift, streamlining the investigation process for suspicious claims.
Social Network Analytics for Supervised Fraud Detection in Insurance
Introduction
The paper "Social Network Analytics for Supervised Fraud Detection in Insurance" proposes an innovative approach to detecting insurance fraud using social network analytics. The authors construct a bipartite network of claims and involved parties, employing the BiRank algorithm to calculate fraud scores for each claim within the network. These scores are integrated with traditional claim-specific features to enhance a supervised model's ability to predict fraudulent claims in motor insurance.
Network Construction and Fraud Detection Strategy
The researchers approach fraud detection by mapping the complexities of insurance claims onto a social network composed of claims and their associated parties, such as policyholders, brokers, experts, and garages. By leveraging the homophilic nature of fraudulent behavior—where fraudulent claims are more commonly linked to other fraudulent claims—the network serves as a fertile ground for identifying potential fraud.
Bipartite Network and Fraud Scores
A key component is the construction of a bipartite network, where claims and involved parties form two distinct node sets, with edges representing shared involvement. Through the BiRank algorithm, which is extended to bipartite networks, the study assigns fraud scores to nodes based on their connectivity to known fraudulent claims. This rank-enhanced network perspective augments traditional features with relational insights.
Feature Engineering and Model Development
Network Features
The authors derive two types of features from the network: fraud score features, indicating proximity to known fraudulent nodes, and neighborhood-based features, which capture the fraud exposure and structural attributes of a claim’s locality within the network. This transformation of network interactions into predictive features is critical for model performance.
Supervised Learning
A logistic regression model, supported by random forests for feature selection, forms the core of the supervised learning approach. Intrinsic features specific to claims, such as claim amount and policyholder details, are synergized with the newly engineered network features. The model targets two supervisory tasks: discerning investigated claims and identifying confirmed fraudulent claims.
Experimental Evaluation and Results
The experimentation process involved tuning the BiRank parameters and exhaustively testing feature combinations to optimize model performance. The results demonstrate the superiority of models incorporating network-derived features, achieving improved AUROC, AUPR, and top decile lift (TDL) metrics compared to models relying purely on intrinsic claim data.
Practical Implications
The integration of social network analytics into fraud detection not only enhances the detection capabilities of the model but also streamlines the investigation process by reducing false positives. By flagging claims with heightened fraud scores, the system suggests a prioritized list of suspicious claims, fostering a more efficient allocation of investigative resources.
Conclusion
This research underscores the value of social network analytics in enriching the dataset traditionally available for fraud detection tasks. By mapping fraud as a social phenomenon and leveraging bipartite network science, the study sets the groundwork for further innovation in fraud detection frameworks. Moving forward, incorporating time-weighted fraud influence and exploring semi-supervised learning avenues can further bolster the robustness of fraud identification systems.
The proposed model is not only computationally feasible for practical deployment but also highlights the expanding role of AI and data science in modernizing insurance fraud management and reducing the economic impact of fraudulent activities.