Improving Business Insurance Loss Models by Leveraging InsurTech Innovation (2401.16723v1)

Published 30 Jan 2024 in q-fin.RM

Abstract: Recent transformative and disruptive advancements in the insurance industry have embraced various InsurTech innovations. In particular, with the rapid progress in data science and computational capabilities, InsurTech is able to integrate a multitude of emerging data sources, shedding light on opportunities to enhance risk classification and claims management. This paper presents a groundbreaking effort as we combine real-life proprietary insurance claims information together with InsurTech data to enhance the loss model, a fundamental component of insurance companies' risk management. Our study further utilizes various machine learning techniques to quantify the predictive improvement of the InsurTech-enhanced loss model over that of the insurance in-house. The quantification process provides a deeper understanding of the value of the InsurTech innovation and advocates potential risk factors that are unexplored in traditional insurance loss modeling. This study represents a successful undertaking of an academic-industry collaboration, suggesting an inspiring path for future partnerships between industry and academic institutions.

References (50)

Citations (1)

View on Semantic Scholar

Summary

The paper shows that InsurTech-enhanced models consistently outperform traditional in-house models in predicting business insurance loss costs.
The paper employs advanced methods such as LightGBM with Bayesian optimization and Tweedie GLM with elastic net, validated via 10-fold cross-validation.
The paper provides interpretability using feature importance measures, ALE plots, and SHAP values to explain key risk factors in business insurance.

You are interested in learning about a paper that studies how InsurTech innovations can improve business insurance loss models. The paper presents a three-party research collaboration between an InsurTech company (Carpe Data), an insurance company, and a university (University of Illinois Urbana-Champaign's IRisk Lab). The authors combined proprietary insurance claims information from the insurance company with InsurTech data from Carpe Data.

The goal of the paper is two-fold:

To mine predictive risk characteristics from InsurTech data and show the improvement compared to an insurance in-house loss model.
Explain these risk characteristics using interpretable machine learning techniques and propose potential rating factors for business insurance.

The paper focuses on Business Owner's Policy (BOP) insurance. BOP bundles multiple insurance coverages, such as property insurance and liability insurance, to protect small- and medium-sized business owners from risks.

The dataset has three coverage types:

Business Building (BG): covers the loss related to the buildings.
Business Personal Property (BP): covers the risks of potential loss, damage, and liability issues for business-use property.
Liability (LIAB): covers the risks of potential legal liability for losses caused by policyholders to a third party.

The response variable is the BOP loss cost during the observation period (2010 to 2020).

The InsurTech data from Carpe Data includes the following categories of features:

Business Information: basic information about business operations (e.g., coordinates, address, operating hours).
Firmographics: business segmentation characteristics (e.g., business size, company type).
Classification: categorization of a business (e.g., category, segment, NAICS code).
Risk Characteristics: features identifying potential risks (e.g., presence of alcohol, chemicals, or outdoor heaters).
Index: a suite of indexes on a 1-5 scale targeting dimensions of risk (e.g., customer rating, visibility, reputation).
Proximity Score: risks associated with surrounding businesses (e.g., proximity to combustibles, entertainment, traffic).
Territory Risk: density scores of risks within a zip code area.
Text Data: webpage content and customer reviews.

The combined dataset used for modeling has 825,622 observations and 596 features. The authors performed data cleaning, feature engineering, and set up a relational database.

The paper illustrates how InsurTech data provides additional information using the law of total variance:

$\operatorname{Var}(Y) = \operatorname{E}\left[\operatorname{Var}\left(Y \mid X^{IH}, X^{IT}\right)\right] + \operatorname{E}[\operatorname{Var}(\operatorname{E}\left[Y \mid X^{IH}, X^{IT}\right] \mid X^{IH})] + \operatorname{Var}(\operatorname{E}\left[Y \mid X^{IH}\right])$

Where:

$Y$ is the claim amount.
$X^{IH}$ represents in-house rating factors.
$X^{IT}$ represents risk factors from InsurTech.

The authors separately modeled the three coverage groups (BG, BP, LIAB). They calibrate the Light Gradient-Boosting Machine (LightGBM) and Tweedie Generalized Linear Model (GLM) with elastic net feature selection. For LightGBM model calibration, they used Bayesian optimization. The optimization objective loss function is the Mean Absolute Error (MAE). They used grid search for Tweedie GLM. They also applied a 10-fold cross-validation to find the best models. They used double lift charts to visually compare the predictive performance. They used the Gini index, Percentage Error (PE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) to examine predictive accuracy.

The InsurTech-enhanced models (LightGBM and Tweedie GLM with elastic net) consistently outperformed the insurance in-house model for each coverage group. This improvement is irrespective of the chosen loss model. The InsurTech-enhanced models significantly reduce the absolute value of PE, indicating better predictive performance at the portfolio level.

The authors use several techniques for model interpretation:

Feature Importance:
- Mean Decrease in Impurity (MDI).
- Mean Decrease in Accuracy (MDA).
- SHapley Additive exPlanations (SHAP).
Accumulated Local Effects (ALE) Plots: To visualize the average impact of a feature on the predictions.
Illustrative Individual Cases: Using SHAP values to explain individual predictions.

The model interpretability revealed several risk factors derived from InsurTech data, including:

Coordinates of addresses
Proximity scores
Territory risks (especially version 2, e.g., TERRITORY.j2, related to fire risk)
Visibility indexes
Review scores
Business classification and segment proportions

ALE plots for proximity traffic scores demonstrated how these features impact the observed claim loss. Illustrative individual cases with SHAP values demonstrated how the features influenced individual predictions.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/QFinancePapers/status/1752552578175451470

https://twitter.com/BMouler/status/1752513989458694302