Papers
Topics
Authors
Recent
2000 character limit reached

Proportional Hazards Mixture Cure Model

Updated 11 December 2025
  • The Proportional Hazards Mixture Cure Model is a semiparametric framework that distinguishes cured subjects through logistic regression and assesses event risk using a Cox model.
  • It integrates dual components to separately capture the cure incidence and post-cure latency effects of covariates, ensuring interpretability and accurate inference.
  • Efficient estimation via the EM algorithm and profile likelihood delivers unbiased parameter estimates with well-established asymptotic properties.

A proportional hazards mixture cure model is a statistical framework for analyzing time-to-event data where a non-negligible fraction of subjects is assumed to be “cured”—that is, they will never experience the event of interest, no matter how long they are followed. This model extends traditional survival models by explicitly accounting for the cured proportion and allowing separate assessment of covariates’ effects on both the cure incidence (probability of being uncured) and the subsequent event-risk (latency) for the susceptible individuals. The standard approach specifies logistic regression for the incidence component and a Cox proportional hazards regression for the latency, integrating them into a semiparametric mixture structure. The proportional hazards mixture cure model provides a robust framework for both inference and prediction in the presence of cured subpopulations and is underpinned by a rigorously developed asymptotic theory (Mohammad et al., 2019).

1. Model Specification and Structure

Let TiT_i denote the observed survival or censoring time for individual ii, with censoring indicator δi=I{event observed}\delta_i=\mathbb I\{\text{event observed}\}. Two sets of covariates are considered: XiX_i for the cure incidence and ZiZ_i for the latency model. A latent indicator UiU_i indicates whether subject ii is “susceptible” (Ui=1U_i=1) or “cured” (Ui=0U_i=0).

  1. Cure Incidence (mixture component):

Pr(Ui=1Xi)=pi=exp(Xiγ)1+exp(Xiγ)\Pr(U_i=1 | X_i) = p_i = \frac{\exp(X_i^\top\gamma)}{1+\exp(X_i^\top\gamma)}

This logistic model yields the probability of subject ii being susceptible (uncured) given covariates XiX_i.

  1. Latency (proportional hazards component): Among susceptibles, the conditional hazard function is

h(tUi=1,Zi;β)=h0(t)exp(βZi)h(t|U_i=1,Z_i;\beta) = h_0(t) \exp(\beta^\top Z_i)

and the corresponding survival function is

S(tUi=1,Zi)=exp[Λ0(t)exp(βZi)]S(t|U_i=1,Z_i) = \exp\big[-\Lambda_0(t)\exp(\beta^\top Z_i)\big]

where Λ0(t)=0th0(s)ds\Lambda_0(t) = \int_0^t h_0(s) ds is the baseline cumulative hazard.

  1. Marginal (population) survival:

Spop(tXi,Zi;γ,β,Λ0)=1pi+piS(tUi=1,Zi)S_{\text{pop}}(t|X_i,Z_i;\gamma,\beta,\Lambda_0) = 1 - p_i + p_i S(t|U_i=1,Z_i)

This combines the cure probability and susceptible subpopulation survival in a two-component mixture (Mohammad et al., 2019).

2. Likelihood Construction and Profile Likelihood

The observed data likelihood, treating the latent UiU_i as missing for censored cases, is constructed as:

L(γ,β,Λ0)=i=1n[pif(TiUi=1,Zi)]δi[1pi+piS(TiUi=1,Zi)]1δiL(\gamma,\beta,\Lambda_0) = \prod_{i=1}^n [p_i f(T_i|U_i=1,Z_i)]^{\delta_i} \cdot [1 - p_i + p_i S(T_i|U_i=1,Z_i)]^{1-\delta_i}

with f(tU=1,Z)=h(tU=1,Z)S(tU=1,Z)f(t|U=1,Z) = h(t|U=1,Z)S(t|U=1,Z). The baseline hazard Λ0\Lambda_0 is eliminated via profiling, using a weighted Breslow-type estimator:

Λ^0(t;β)=i:Tityi / j=1nyjYj(Ti)eβZj\hat\Lambda_0(t;\beta) = \sum_{i:T_i\le t} y_i ~\big/~ \sum_{j=1}^n y_j Y_j(T_i) e^{\beta^\top Z_j}

where the weights

yi={1if δi=1 piS(TiU=1,Zi)1pi+piS(TiU=1,Zi)if δi=0y_i = \begin{cases} 1 & \text{if } \delta_i=1 \ \frac{p_i S(T_i|U=1,Z_i)}{1-p_i + p_i S(T_i|U=1,Z_i)} & \text{if } \delta_i=0 \end{cases}

are the expected posterior probabilities of Ui=1U_i=1 given the current parameters and observed data.

Plugging Λ^0\hat\Lambda_0 into the likelihood yields the profile likelihood LPL(γ,β)L_{PL}(\gamma,\beta), which forms the basis for efficient estimation and inference (Mohammad et al., 2019).

3. Asymptotic Theory, Efficiency, and Variance Estimation

The model’s estimation theory is grounded in semiparametric M-estimation and tangent space projections. Key features:

  • The profile likelihood score for (γ,β)(\gamma,\beta) at the profiled baseline hazard is:
    • Incidence: Uγ=i=1n[yipi]XiU_\gamma = \sum_{i=1}^n [y_i - p_i] X_i
    • Latency: Uβ=i=1nyi{δi[ZiEβ(Ti)]j:TjTiyjeβZjZj / j:TjTiyjeβZj}U_\beta = \sum_{i=1}^n y_i\left\{ \delta_i [Z_i - E_\beta(T_i)] - \sum_{j:T_j\le T_i} y_j e^{\beta^\top Z_j} Z_j ~/~ \sum_{j:T_j\le T_i} y_j e^{\beta^\top Z_j} \right\}
    • with Eβ(t)E_\beta(t) appropriately weighted in risk sets.
  • Under standard regularity (bounded covariates, positivity, identifiability), (γ^,β^)(\hat\gamma,\hat\beta) are asymptotically normal:

n((γ^,β^)(γ0,β0))dN(0,I1)\sqrt{n}\big((\hat\gamma,\hat\beta) - (\gamma_0,\beta_0)\big) \xrightarrow{d} N(0, I^{-1})

where II is the profile-efficient information matrix, computed as the variance of the score vector.

  • Mohammad et al. demonstrate that the profile-likelihood score equals the efficient score obtained via projection theory, and the efficient information is given by the sample variance of the score function (Mohammad et al., 2019).

Empirical information and standard errors can be consistently estimated by:

I^e=n1i=1nsi(θ^)si(θ^)\hat I_e = n^{-1} \sum_{i=1}^n s_i(\hat\theta) s_i(\hat\theta)^\top

where si()s_i(\cdot) denotes the individual score vector contributions at the estimated parameters.

4. Estimation Algorithms and Practical Implementation

Estimation is typically performed via a combination of the EM algorithm and profile likelihood maximization:

  • E-step: Compute posterior weights yiy_i for each subject, reflecting the probability of being uncured given the observed time/censoring and current parameters.
  • M-step: Update the incidence parameters by weighted logistic regression, and the latency (proportional hazards) parameters by a weighted partial likelihood approach.

Computation of the nonparametric baseline cumulative hazard uses a recursive Breslow estimator with the yiy_i as weights. This approach is implemented in the SMCURE R package and provides consistent, efficient inference for both model components (Mohammad et al., 2019).

5. Simulation Studies and Data Applications

Simulations by Mohammad et al. across a range of cure-rate scenarios (11–75%) and covariate settings demonstrate:

  • Estimators for both incidence and latency coefficients exhibit negligible bias (<0.06<0.06).
  • Standard errors computed analytically via the profile score closely match those obtained by nonparametric bootstrap, with consistent 95% coverage.
  • Application to ECOG E1684 melanoma data confirms that both statistical significance and coefficient magnitudes are stable across profile likelihood and SMCURE-bootstrap approaches, with interpretational clarity provided by modeling cure and latency separately (Mohammad et al., 2019).

6. Extensions, Generalizations, and Comparative Results

The proportional hazards mixture cure model is flexible and can be extended or embedded in richer frameworks:

These developments maintain the core structure: logistic (or, more generally, flexible) incidence modeling coupled with a proportional hazards latency, under a two-component mixture for population survival.

7. Theoretical and Practical Implications

Adoption of the proportional hazards mixture cure model addresses a major limitation of standard survival analyses in the presence of cure, namely, the overestimation of long-term risk when cured patients are not separated. Efficient estimation approaches, rooted in profile likelihood and projection-theoretic efficiency, provide both theoretical rigor and practical accuracy for inference about both cure incidence and latency parameters. Empirical variance estimation via profile scores offers robust alternatives to computationally intensive bootstrap procedures, facilitating large-scale applications (Mohammad et al., 2019).

In conclusion, the proportional hazards mixture cure model is a foundational semiparametric cure modeling framework, allowing distinct and interpretable modeling of both the cure process and post-cure event dynamics, with a well-developed theory of efficient estimation, variance, and practical implementation.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Proportional Hazards Mixture Cure Model.