Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

An application of Zero-One Inflated Beta regression models for predicting health insurance reimbursement (2011.09248v1)

Published 18 Nov 2020 in stat.ME and q-fin.RM

Abstract: In actuarial practice the dependency between contract limitations (deductibles, copayments) and health care expenditures are measured by the application of the Monte Carlo simulation technique. We propose, for the same goal, an alternative approach based on Generalized Linear Model for Location, Scale and Shape (GAMLSS). We focus on the estimate of the ratio between the one-year reimbursement amount (after the effect of limitations) and the one year expenditure (before the effect of limitations). We suggest a regressive model to investigate the relation between this response variable and a set of covariates, such as limitations and other rating factors related to health risk. In this way a dependency structure between reimbursement and limitations is provided. The density function of the ratio is a mixture distribution, indeed it can continuously assume values mass at 0 and 1, in addition to the probability density within (0, 1) . This random variable does not belong to the exponential family, then an ordinary Generalized Linear Model is not suitable. GAMLSS introduces a probability structure compliant with the density of the response variable, in particular zero-one inflated beta density is assumed. The latter is a mixture between a Bernoulli distribution and a Beta distribution.

Summary

  • The paper introduces a Zero-One Inflated Beta regression model within a GAMLSS framework to predict health insurance reimbursement proportions.
  • This model effectively addresses the common issue in health insurance data where reimbursement ratios exhibit significant concentrations at the boundary values of 0 and 1.
  • Applied to real insurance data, the proposed method accurately estimates both the continuous reimbursement proportions and the discrete probabilities of zero or full reimbursement.

The paper presents a technical framework for estimating the proportion of health insurance expenditures reimbursed after deductibles and other limitations using a mixture regression model. The authors propose an alternative to simulation‐based actuarial methods by modeling the indicated deductible relativity (IDR) using a Zero-One Inflated Beta (BEINF) distribution within a GAMLSS (Generalized Additive Models for Location, Scale, and Shape) framework.

The paper is motivated by the observation that, in health insurance datasets, the response variable representing the ratio of one-year reimbursement to one-year expenditure can assume values continuously in the interval (0,1) while also exhibiting non‐negligible point masses at 0 and 1. In other words, the IDR is a mixture of a continuous component (modeled via a Beta density) and discrete masses at the boundaries. The formulation is as follows:

  • For a generic value rr, the probability density is defined by
    • P(R=0)=p0P(R = 0) = p_0,
    • P(R=1)=p1P(R = 1) = p_1, and
    • For $0 < r < 1$,
    •    beinf(r;p0,p1,a,b)=(1p0p1)f(r;a,b)      beinf(r;p_0,p_1,a,b) = (1-p_0-p_1) \cdot f(r; a,b)    
    • where
    •    f(r;a,b)=Γ(a+b)Γ(a)Γ(b)ra1(1r)b1      f(r; a,b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \cdot r^{a-1}(1 - r)^{b-1}    
    • with a,b>0a,b>0; here, Γ()\Gamma(\cdot) is the Gamma function.

This model is particularly attractive for potential deviations from the exponential family, as ordinary GLMs are inadequate when the response exhibits boundary inflation.

Key aspects of the methodology include:

  • Actuarial Modeling Framework:
    • Let YiY_i denote the expenditure for a given claim before deductibles and LiL_i the corresponding reimbursement after applying deductibles and caps.
    • The IDR is defined via
    •    Ri=YiLi      R_i = \frac{Y_i}{L_i}    
    • so that RiR_i lies in [0,1][0,1], with Ri=0R_i=0 when claims do not exceed the deductible and Ri=1R_i=1 when the out-of-pocket cap is reached.
    • The structure naturally induces a mixture density where both P(R=0)P(R=0) and P(R=1)P(R=1) are strictly positive.
  • Regression via GAMLSS:
    • The model parametrizes the response distribution through four parameters: μ\mu, σ\sigma, ν\nu, and τ\tau, defined by
    •    μ=aa+b,σ=1a+b+1,ν=p01p0p1,τ=p11p0p1.      \mu = \frac{a}{a+b}, \quad \sigma = \frac{1}{a+b+1}, \quad \nu = \frac{p_0}{1-p_0-p_1}, \quad \tau = \frac{p_1}{1-p_0-p_1}.    
    • This reparameterization allows describing both the location and dispersion of the Beta component, with ν\nu and τ\tau capturing the mass at the boundaries.
    • Each parameter is modeled as a function of covariates via link functions: a logit link for μ\mu and σ\sigma to ensure outputs in (0,1), and a logarithmic link for ν\nu and τ\tau. In particular, with deductible levels as covariates, the linear predictors are specified as:
    •    logit(μ)=β1,0+β1,1deductible2+β1,2deductible3,      logit(\mu) = \beta_{1,0} + \beta_{1,1}\cdot deductible_2 + \beta_{1,2}\cdot deductible_3,    
    •    logit(σ)=β2,0,      logit(\sigma) = \beta_{2,0},    
    •    log(ν)=β3,0+β3,1deductible2+β3,2deductible3,      \log(\nu) = \beta_{3,0} + \beta_{3,1}\cdot deductible_2 + \beta_{3,2}\cdot deductible_3,    
    •    log(τ)=β4,0+β4,1deductible2+β4,2deductible3.      \log(\tau) = \beta_{4,0} + \beta_{4,1}\cdot deductible_2 + \beta_{4,2}\cdot deductible_3.    
  • Empirical Application and Numerical Results:
    • The model is fitted to data from an Italian health insurance company covering two product lines: surgery and diagnostic. The respective samples include approximately 63,790 and 58,994 policyholders.
    • Deductible levels (with three distinct categories) are incorporated as covariates, affecting both the continuous Beta component and the probabilities p0p_0 and p1p_1.
    • The fitted estimates closely track the observed frequencies. For example, for the surgery cover at Level 1, the continuous component (Beta) is estimated at 88.08% compared to an observed 88.72%, while the estimated mass at 0 (p0p_0) is exactly 97.88%. Similar levels of accuracy and consistency are observed for other deductible levels and for the diagnostic branch.
    • Graphical diagnostics further confirm that the model adequately captures both the boundary mass (at 0 and 1) and the distribution within the interval, with discrepancies in the continuous region being less than 1%.

In summary, the paper develops and implements a regression approach that jointly models the continuous and discrete aspects of the reimbursement proportion in health insurance claims. By employing a GAMLSS (Generalized Additive Models for Location, Scale, and Shape) framework with a Zero-One Inflated Beta distribution, the approach successfully addresses the inherent mixture nature of the data. The strong empirical performance, as evidenced by near-perfect matching of the boundary probabilities and minimal error in the continuous part, highlights the potential efficacy of such models in actuarial contexts where contract limitations significantly affect claim outcomes.