Papers
Topics
Authors
Recent
2000 character limit reached

Social network analytics for supervised fraud detection in insurance

Published 15 Sep 2020 in cs.SI, cs.CR, and stat.ML | (2009.08313v1)

Abstract: Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, brokers, experts, and garages. Next, we establish fraud as a social phenomenon in the network and use the BiRank algorithm with a fraud specific query vector to compute a fraud score for each claim. From the network, we extract features related to the fraud scores as well as the claims' neighborhood structure. Finally, we combine these network features with the claim-specific features and build a supervised model with fraud in motor insurance as the target variable. Although we build a model for only motor insurance, the network includes claims from all available lines of business. Our results show that models with features derived from the network perform well when detecting fraud and even outperform the models using only the classical claim-specific features. Combining network and claim-specific features further improves the performance of supervised learning models to detect fraud. The resulting model flags highly suspicions claims that need to be further investigated. Our approach provides a guided and intelligent selection of claims and contributes to a more effective fraud investigation process.

Citations (29)

Summary

  • The paper's main contribution is integrating social network analytics with a bipartite network and the BiRank algorithm to assign fraud scores to insurance claims.
  • Network features, including fraud score and neighborhood attributes, are combined with intrinsic claim data to significantly enhance model performance.
  • Experimental results show improvements in AUROC, AUPR, and top decile lift, streamlining the investigation process for suspicious claims.

Social Network Analytics for Supervised Fraud Detection in Insurance

Introduction

The paper "Social Network Analytics for Supervised Fraud Detection in Insurance" proposes an innovative approach to detecting insurance fraud using social network analytics. The authors construct a bipartite network of claims and involved parties, employing the BiRank algorithm to calculate fraud scores for each claim within the network. These scores are integrated with traditional claim-specific features to enhance a supervised model's ability to predict fraudulent claims in motor insurance.

Network Construction and Fraud Detection Strategy

The researchers approach fraud detection by mapping the complexities of insurance claims onto a social network composed of claims and their associated parties, such as policyholders, brokers, experts, and garages. By leveraging the homophilic nature of fraudulent behavior—where fraudulent claims are more commonly linked to other fraudulent claims—the network serves as a fertile ground for identifying potential fraud.

Bipartite Network and Fraud Scores

A key component is the construction of a bipartite network, where claims and involved parties form two distinct node sets, with edges representing shared involvement. Through the BiRank algorithm, which is extended to bipartite networks, the study assigns fraud scores to nodes based on their connectivity to known fraudulent claims. This rank-enhanced network perspective augments traditional features with relational insights.

Feature Engineering and Model Development

Network Features

The authors derive two types of features from the network: fraud score features, indicating proximity to known fraudulent nodes, and neighborhood-based features, which capture the fraud exposure and structural attributes of a claim’s locality within the network. This transformation of network interactions into predictive features is critical for model performance.

Supervised Learning

A logistic regression model, supported by random forests for feature selection, forms the core of the supervised learning approach. Intrinsic features specific to claims, such as claim amount and policyholder details, are synergized with the newly engineered network features. The model targets two supervisory tasks: discerning investigated claims and identifying confirmed fraudulent claims.

Experimental Evaluation and Results

The experimentation process involved tuning the BiRank parameters and exhaustively testing feature combinations to optimize model performance. The results demonstrate the superiority of models incorporating network-derived features, achieving improved AUROC, AUPR, and top decile lift (TDL) metrics compared to models relying purely on intrinsic claim data.

Practical Implications

The integration of social network analytics into fraud detection not only enhances the detection capabilities of the model but also streamlines the investigation process by reducing false positives. By flagging claims with heightened fraud scores, the system suggests a prioritized list of suspicious claims, fostering a more efficient allocation of investigative resources.

Conclusion

This research underscores the value of social network analytics in enriching the dataset traditionally available for fraud detection tasks. By mapping fraud as a social phenomenon and leveraging bipartite network science, the study sets the groundwork for further innovation in fraud detection frameworks. Moving forward, incorporating time-weighted fraud influence and exploring semi-supervised learning avenues can further bolster the robustness of fraud identification systems.

The proposed model is not only computationally feasible for practical deployment but also highlights the expanding role of AI and data science in modernizing insurance fraud management and reducing the economic impact of fraudulent activities.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.