Papers
Topics
Authors
Recent
2000 character limit reached

Survival-CRPS: Scoring for Censored Data

Updated 15 December 2025
  • Survival-CRPS is a scoring rule designed for probabilistic survival prediction that extends CRPS to handle right- and interval-censored data.
  • It optimizes both calibration and sharpness, ensuring that predicted distributions remain tightly aligned with observed survival times.
  • Despite its empirical success in yielding sharper predictions, Survival-CRPS can be non-proper under random right-censoring, warranting careful model evaluation.

Survival-CRPS is a generalized scoring rule designed for probabilistic survival prediction in the presence of censoring. It extends the continuous ranked probability score (CRPS) to right-censored and interval-censored data, aiming to optimize both calibration and sharpness of survival distributions. Calibration refers to predicted probabilities matching real-world event frequencies, while sharpness quantifies the concentration of predictive distributions around true outcomes. Survival-CRPS is constructed to moderate the hypersensitivity of likelihood-based approaches, providing stable training objectives for neural survival models, and has been empirically demonstrated to yield sharper and well-calibrated survival predictions in electronic health record datasets (Avati et al., 2018). However, its theoretical propriety under random right-censoring is nuanced: while classical CRPS is proper for continuous outcomes, Survival-CRPS can be non-proper under right-censoring, meaning the true distribution may score worse than certain incorrect predictive distributions (Rindt et al., 2021).

1. Mathematical Formulation of Survival-CRPS

The Survival-CRPS for a single right-censored observation (Z,D)(Z, D), where Z=min{T,C}Z = \min\{T,C\} and D=1{TC}D = 1\{T \leq C\}, and a candidate CDF F^(tx)\hat F(t \mid x), is defined as:

SCRPS(F^,(Z,D))=0Z[F^(tx)]2dt+DZ[1F^(tx)]2dt\mathcal S_{\mathrm{CRPS}}\bigl(\hat F, (Z,D)\bigr) = \int_0^Z \bigl[\hat F(t\mid x)\bigr]^2\,dt + D \int_Z^{\infty}\bigl[1-\hat F(t\mid x)\bigr]^2\,dt

Equivalently, with S^(tx)=1F^(tx)\hat S(t\mid x) = 1 - \hat F(t\mid x),

SCRPS=0Z[1S^(tx)]2dt+DZ[S^(tx)]2dt\mathcal S_{\mathrm{CRPS}} = \int_0^Z [1 - \hat S(t\mid x)]^2\,dt + D \int_Z^{\infty} [\hat S(t\mid x)]^2\,dt

For interval-censored data, where the true event time is only known to lie in [y,T][y, \mathcal T], the score generalizes to:

SCRPS-INTVL(F^,(y,c,T))=0yF^(z)2dz+(1c)yT[1F^(z)]2dz+T[1F^(z)]2dz\mathcal S_{\mathrm{CRPS\text{-}INTVL}}\bigl(\hat F,(y,c,\mathcal T)\bigr) = \int_0^y \hat F(z)^2\,dz + (1-c)\int_y^{\mathcal T} [1 - \hat F(z)]^2\,dz + \int_{\mathcal T}^{\infty} [1 - \hat F(z)]^2\,dz

These formulations directly connect CRPS to the structure of survival data, preserving its penalization of predictive mass misallocated above/below observed or censored times (Avati et al., 2018).

2. Properness and Theoretical Limitations

A scoring rule S(P,y)\mathcal S(P, y) is proper if its expected value is minimized by the true distribution PP. Survival-CRPS inherits propriety when applied to uncensored or interval-censored data under certain model classes. For continuous outcomes without censoring, CRPS is strictly proper. For censored data, Survival-CRPS remains proper if F^\hat F ranges over all continuous survival functions and censoring is independent and known (Avati et al., 2018).

However, under practical right-censoring, Survival-CRPS can be non-proper. Theoretical analysis demonstrates that Survival-CRPS can be gamed due to random weighting inside the integrals, allowing "fake" CDFs to focus probability mass just beyond typical censoring times, thereby scoring better than the true distribution. For example, simulating TExp(1/100)T \sim \mathrm{Exp}(1/100) and CExp(1/10)C \sim \mathrm{Exp}(1/10), a fake model F^=Exp(1/25)\hat F = \mathrm{Exp}(1/25) achieves a lower Survival-CRPS than the true Exp(1/100)\mathrm{Exp}(1/100), violating the requisites of properness (Rindt et al., 2021).

3. Comparison to Alternative Scoring Rules

Several alternative survival scoring rules are widely used, with distinct handling of censoring:

Scoring Rule Proper under right-censoring? Typical Pitfall
Time-dependent concordance No Can be "anti-correlated"
Integrated Brier score Not in general Gamed by weight estimation
Integrated binomial log-likelihood No Sensitive to censoring
Survival-CRPS No (general random-censoring) Gamed by mass-shifting
Right-censored log-likelihood Yes Often computationally intractable

Time-dependent concordance, integrated Brier score (IBS), and integrated binomial log-likelihood (IBLL) also fail to be proper under generic right-censoring, often ranking incorrect models above the true generative process in empirical tests (Rindt et al., 2021). The only consistently proper rule under standard survival assumptions is the right-censored log-likelihood, which rewards exactly the true survival function SS:

logLn(θ)=i=1n[dilogfθ(zixi)+(1di)logSθ(zixi)]\log L_n(\theta) = \sum_{i=1}^n \bigl[ d_i \log f_\theta(z_i\mid x_i) + (1-d_i)\log S_\theta(z_i\mid x_i) \bigr]

4. Training and Implementation of Survival-CRPS

Survival-CRPS can be efficiently implemented by numerical quadrature over predictive CDFs, combined with standard gradient-based optimization. The procedure applies as follows for right-censored data (Avati et al., 2018):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Given:
  - Dataset {(xᵢ, yᵢ, cᵢ)}, yᵢ0, cᵢ{0,1}
  - Parametric model Fθ: x (parameters of a CDF Fθ(z))
  - Grid {z_k}, k=0...K, with weights Δz_k
Repeat until convergence:
  1. Sample minibatch
  2. For i in batch:
       θᵢ = model parameters
       Fᵢ(k) = F_{θᵢ}(z_k)
       term_below = sum_k(z_k < yᵢ) [Fᵢ(k)]²  Δz_k
       term_above = sum_k(z_k  yᵢ) [1  Fᵢ(k)]²  Δz_k
       lossᵢ = term_below + (1cᵢ)·term_above
  3. L = (1/|batch|)  lossᵢ
  4. Backpropagate gradients L/θ

This approach is compatible with parametric CDF families like log-normal, allowing rapid backpropagation and flexibility in network architecture.

5. Empirical Behavior: Calibration and Sharpness

Empirical investigations on EHR datasets (STARR, MIMIC-III) demonstrate the following:

  • Calibration slope (ideal=1.0):
    • MLE-right: ≈1.13, CRPS-right: ≈1.00
    • MLE-intvl: ≈1.14, CRPS-intvl: ≈0.96
  • Sharpness (coefficient of variation, CoV\mathrm{CoV}):
    • MLE-right: very high (e.g., 18.4 on STARR)
    • CRPS-intvl: much lower (≈0.30)
  • Survival-AUPRC:
    • Dead (uncensored): MLE-right 0.233 vs. CRPS-intvl 0.366
    • Alive (interval): MLE-intvl 0.963 vs. CRPS-right/intvl ~0.976

Qualitative analysis shows that MLE-trained densities are over-diffuse under heavy censoring, whereas Survival-CRPS models produce sharper, more concentrated predictive distributions while retaining good calibration (Avati et al., 2018). This suggests Survival-CRPS is well-suited to clinical settings requiring individualized survival predictions.

6. Survival-CRPS and Neural Survival Models

Deep neural architectures, including recurrent neural networks (RNNs) and fully-connected networks (FCNs), have leveraged Survival-CRPS for direct optimization in survival regression tasks (Avati et al., 2018). Despite its empirical success, the incompletely proper nature of Survival-CRPS under generic random censorship has prompted methodological advances, such as monotonic neural networks (SuMo-net), which enforce monotonicity and allow direct optimization of the right-censored log-likelihood with auto-differentiation (Rindt et al., 2021). This results in state-of-the-art log-likelihood scores, accurate calibration, and substantial inference speedups over proxy-based models.

7. Current Perspectives and Practical Considerations

While Survival-CRPS provides immediate implementation advantages, supports interval and right-censoring, and yields empirically strong results in real-world tasks, its theoretical limitations under random right-censoring are established. The use of proper scoring rules is essential for consistency in model selection and estimation. Architectures such as SuMo-net, which directly optimize proper scores, offer scalable solutions that overcome tractability constraints found in classical approaches.

A plausible implication is that Survival-CRPS, when used thoughtfully in conjunction with model selection and assessment strategies sensitive to its non-proper scenarios, remains highly relevant for modern survival analysis—especially where neural models and massive datasets are employed. However, for rigorous inferential settings and unbiased model ranking, proper scoring rules such as the right-censored log-likelihood should be preferred whenever feasible (Rindt et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Survival-CRPS.