Proportional Hazards Mixture Cure Model

Updated 11 December 2025

The Proportional Hazards Mixture Cure Model is a semiparametric framework that distinguishes cured subjects through logistic regression and assesses event risk using a Cox model.
It integrates dual components to separately capture the cure incidence and post-cure latency effects of covariates, ensuring interpretability and accurate inference.
Efficient estimation via the EM algorithm and profile likelihood delivers unbiased parameter estimates with well-established asymptotic properties.

A proportional hazards mixture cure model is a statistical framework for analyzing time-to-event data where a non-negligible fraction of subjects is assumed to be “cured”—that is, they will never experience the event of interest, no matter how long they are followed. This model extends traditional survival models by explicitly accounting for the cured proportion and allowing separate assessment of covariates’ effects on both the cure incidence (probability of being uncured) and the subsequent event-risk (latency) for the susceptible individuals. The standard approach specifies logistic regression for the incidence component and a Cox proportional hazards regression for the latency, integrating them into a semiparametric mixture structure. The proportional hazards mixture cure model provides a robust framework for both inference and prediction in the presence of cured subpopulations and is underpinned by a rigorously developed asymptotic theory (Mohammad et al., 2019).

1. Model Specification and Structure

Let $T_i$ denote the observed survival or censoring time for individual $i$ , with censoring indicator $\delta_i=\mathbb I\{\text{event observed}\}$ . Two sets of covariates are considered: $X_i$ for the cure incidence and $Z_i$ for the latency model. A latent indicator $U_i$ indicates whether subject $i$ is “susceptible” ( $U_i=1$ ) or “cured” ( $U_i=0$ ).

Cure Incidence (mixture component):

$\Pr(U_i=1 | X_i) = p_i = \frac{\exp(X_i^\top\gamma)}{1+\exp(X_i^\top\gamma)}$

This logistic model yields the probability of subject $i$ being susceptible (uncured) given covariates $X_i$ .

Latency (proportional hazards component): Among susceptibles, the conditional hazard function is

$h(t|U_i=1,Z_i;\beta) = h_0(t) \exp(\beta^\top Z_i)$

and the corresponding survival function is

$S(t|U_i=1,Z_i) = \exp\big[-\Lambda_0(t)\exp(\beta^\top Z_i)\big]$

where $\Lambda_0(t) = \int_0^t h_0(s) ds$ is the baseline cumulative hazard.

Marginal (population) survival:

$S_{\text{pop}}(t|X_i,Z_i;\gamma,\beta,\Lambda_0) = 1 - p_i + p_i S(t|U_i=1,Z_i)$

This combines the cure probability and susceptible subpopulation survival in a two-component mixture (Mohammad et al., 2019).

2. Likelihood Construction and Profile Likelihood

The observed data likelihood, treating the latent $U_i$ as missing for censored cases, is constructed as:

$L(\gamma,\beta,\Lambda_0) = \prod_{i=1}^n [p_i f(T_i|U_i=1,Z_i)]^{\delta_i} \cdot [1 - p_i + p_i S(T_i|U_i=1,Z_i)]^{1-\delta_i}$

with $f(t|U=1,Z) = h(t|U=1,Z)S(t|U=1,Z)$ . The baseline hazard $\Lambda_0$ is eliminated via profiling, using a weighted Breslow-type estimator:

$\hat\Lambda_0(t;\beta) = \sum_{i:T_i\le t} y_i ~\big/~ \sum_{j=1}^n y_j Y_j(T_i) e^{\beta^\top Z_j}$

where the weights

$y_i = \begin{cases} 1 & \text{if } \delta_i=1 \ \frac{p_i S(T_i|U=1,Z_i)}{1-p_i + p_i S(T_i|U=1,Z_i)} & \text{if } \delta_i=0 \end{cases}$

are the expected posterior probabilities of $U_i=1$ given the current parameters and observed data.

Plugging $\hat\Lambda_0$ into the likelihood yields the profile likelihood $L_{PL}(\gamma,\beta)$ , which forms the basis for efficient estimation and inference (Mohammad et al., 2019).

3. Asymptotic Theory, Efficiency, and Variance Estimation

The model’s estimation theory is grounded in semiparametric M-estimation and tangent space projections. Key features:

The profile likelihood score for $(\gamma,\beta)$ $(γ, β)$ at the profiled baseline hazard is:
- Incidence: $U_\gamma = \sum_{i=1}^n [y_i - p_i] X_i$
- Latency: $U_\beta = \sum_{i=1}^n y_i\left\{ \delta_i [Z_i - E_\beta(T_i)] - \sum_{j:T_j\le T_i} y_j e^{\beta^\top Z_j} Z_j ~/~ \sum_{j:T_j\le T_i} y_j e^{\beta^\top Z_j} \right\}$
- with $E_\beta(t)$ appropriately weighted in risk sets.
Under standard regularity (bounded covariates, positivity, identifiability), $(\hat\gamma,\hat\beta)$ are asymptotically normal:

$\sqrt{n}\big((\hat\gamma,\hat\beta) - (\gamma_0,\beta_0)\big) \xrightarrow{d} N(0, I^{-1})$

where $I$ is the profile-efficient information matrix, computed as the variance of the score vector.

Mohammad et al. demonstrate that the profile-likelihood score equals the efficient score obtained via projection theory, and the efficient information is given by the sample variance of the score function (Mohammad et al., 2019).

Empirical information and standard errors can be consistently estimated by:

$\hat I_e = n^{-1} \sum_{i=1}^n s_i(\hat\theta) s_i(\hat\theta)^\top$

where $s_i(\cdot)$ denotes the individual score vector contributions at the estimated parameters.

4. Estimation Algorithms and Practical Implementation

Estimation is typically performed via a combination of the EM algorithm and profile likelihood maximization:

E-step: Compute posterior weights $y_i$ for each subject, reflecting the probability of being uncured given the observed time/censoring and current parameters.
M-step: Update the incidence parameters by weighted logistic regression, and the latency (proportional hazards) parameters by a weighted partial likelihood approach.

Computation of the nonparametric baseline cumulative hazard uses a recursive Breslow estimator with the $y_i$ as weights. This approach is implemented in the SMCURE R package and provides consistent, efficient inference for both model components (Mohammad et al., 2019).

5. Simulation Studies and Data Applications

Simulations by Mohammad et al. across a range of cure-rate scenarios (11–75%) and covariate settings demonstrate:

Estimators for both incidence and latency coefficients exhibit negligible bias ( $<0.06$ ).
Standard errors computed analytically via the profile score closely match those obtained by nonparametric bootstrap, with consistent 95% coverage.
Application to ECOG E1684 melanoma data confirms that both statistical significance and coefficient magnitudes are stable across profile likelihood and SMCURE-bootstrap approaches, with interpretational clarity provided by modeling cure and latency separately (Mohammad et al., 2019).

6. Extensions, Generalizations, and Comparative Results

The proportional hazards mixture cure model is flexible and can be extended or embedded in richer frameworks:

Competing-risks, frailty, or time-varying effect structures (Kızılaslan et al., 9 Dec 2025, Nicolaie et al., 2015).
Bayesian inference via P-splines, Laplace approximations, and hierarchical shrinkage (Gressani et al., 2021, Kızılaslan et al., 9 Dec 2025).
Accommodating mismeasured or missing covariates through SIMEX, multiple imputation, or compatible imputation models (Musta et al., 2020, Cipriani et al., 29 Aug 2024, Xu et al., 22 Jul 2025).
Integration with longitudinal or high-dimensional covariates, capable of dynamic prediction and individualized prognosis (Baghfalaki et al., 26 Aug 2025, Ghosal et al., 2023, Cipriani et al., 23 Sep 2025).

These developments maintain the core structure: logistic (or, more generally, flexible) incidence modeling coupled with a proportional hazards latency, under a two-component mixture for population survival.

7. Theoretical and Practical Implications

Adoption of the proportional hazards mixture cure model addresses a major limitation of standard survival analyses in the presence of cure, namely, the overestimation of long-term risk when cured patients are not separated. Efficient estimation approaches, rooted in profile likelihood and projection-theoretic efficiency, provide both theoretical rigor and practical accuracy for inference about both cure incidence and latency parameters. Empirical variance estimation via profile scores offers robust alternatives to computationally intensive bootstrap procedures, facilitating large-scale applications (Mohammad et al., 2019).

In conclusion, the proportional hazards mixture cure model is a foundational semiparametric cure modeling framework, allowing distinct and interpretable modeling of both the cure process and post-cure event dynamics, with a well-developed theory of efficient estimation, variance, and practical implementation.