Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Continuous Ranked Probability Score (CRPS)

Updated 5 August 2025
  • CRPS is a proper scoring rule that quantifies the accuracy of probabilistic forecasts by measuring the squared distance between the forecast CDF and the observed indicator function.
  • It encourages both calibration and sharpness in probabilistic models, playing a vital role in meteorology, economics, and machine learning.
  • CRPS is used as a loss function for distributional regression and forecast aggregation, offering computational efficiency and robust inferential guarantees.

The Continuous Ranked Probability Score (CRPS) is a strictly proper scoring rule that quantifies the accuracy of probabilistic forecasts for real-valued outcomes by measuring the squared distance between the forecast cumulative distribution function (CDF) and the indicator function of the observed outcome. As a loss function, CRPS is central to the evaluation and training of distributional regression models, ensemble forecasting systems, and probabilistic machine learning workflows across a variety of domains, particularly meteorology, economics, and machine learning. Its utility derives from its ability to jointly reward calibration and sharpness in forecast distributions and its amenability to robust, interpretable, and computationally efficient implementations.

1. Definition and Fundamental Properties

The CRPS for a predictive distribution FF and observation yy is defined as

CRPS(F,y)=[F(x)1{yx}]2dx,\operatorname{CRPS}(F, y) = \int_{-\infty}^\infty [F(x) - 1\{y \leq x\}]^2 dx,

which can alternatively be represented as

CRPS(F,y)=EFYy12EFYY,\operatorname{CRPS}(F, y) = \mathbb{E}_F|Y - y| - \tfrac{1}{2}\mathbb{E}_F|Y - Y'|,

where YY and YY' are independent random variables distributed according to FF. This formulation interprets CRPS as the difference between the expected absolute error and half the expected absolute difference within the predictive distribution. Because the CRPS is a strictly proper scoring rule, its unique minimizer is the true data-generating distribution, as formalized by

CRPS(F,G)CRPS(G,G)=RF(z)G(z)2dz,\overline{\operatorname{CRPS}}(F, G) - \overline{\operatorname{CRPS}}(G, G) = \int_\mathbb{R} |F(z) - G(z)|^2 dz,

where GG is the true distribution (Pic et al., 2022). This L2^2-distance property is fundamental to its theoretical and practical appeal.

2. Proper Scoring, Calibration, and Sharpness

Being strictly proper, the CRPS incentivizes honest (well-calibrated) probabilistic forecasts. In distributional regression, minimizing the empirical average of the CRPS (the "risk") is equivalent to minimizing the L2^2-distance between the model's predictive CDF and the true conditional CDF (Pic et al., 2022). This ensures that the CRPS-based model does not merely capture the mean but aligns the full predictive distribution with empirical observations.

A key interpretive aspect is the decomposition of CRPS into calibration and sharpness (refinement) components (Taillardat et al., 2019, Arnold et al., 2023). The CRPS captures both the spread of the forecast distribution (sharpness) and its statistical compatibility with the observed outcomes (calibration). Recent work has introduced isotonicity-based decompositions, splitting the mean CRPS into miscalibration, discrimination ability, and intrinsic uncertainty components, with the recalibration step handled by isotonic distributional regression (IDR). This decomposition, denoted S=MSCDSC+UNCS = MSC - DSC + UNC, provides interpretable diagnostics for forecast evaluation (Arnold et al., 2023).

3. CRPS in Estimation and Inference

CRPS is frequently employed as a direct loss in training and estimation rather than merely as an evaluation criterion. In contexts where likelihoods are intractable but CDFs are available—such as max-stable models for extremes—CRPS forms the basis for M-estimation procedures (Yuen et al., 2013). For independent observations X(1),,X(n)X^{(1)},\ldots, X^{(n)} and a model with parameter θ\theta, the CRPS-based estimator is

θ^n=argminθΘi=1nEθ(X(i)),Eθ(x)=Rd[Fθ(y)1{xy}]2μ(dy).\hat{\theta}_n = \arg\min_{\theta \in \Theta} \sum_{i=1}^n \mathcal{E}_\theta(X^{(i)}), \quad \mathcal{E}_\theta(x) = \int_{\mathbb{R}^d} [F_\theta(y) - \mathbb{1}\{x \le y\}]^2 \mu(dy).

Sufficient conditions for consistency (identifiability, integrability, continuity) and asymptotic normality are available, with detailed expressions for the asymptotic covariance (the so-called "bread" and "meat" matrices) derived for max-stable models (Yuen et al., 2013). When likelihoods are available, the CRPS-based estimator is typically unbiased but less efficient than the maximum likelihood estimator; however, intractable likelihoods or nonidentifiable composite likelihoods make CRPS-based estimation a practical alternative with robust inferential guarantees (Yuen et al., 2013).

4. Extensions and Generalizations

CRPS has been extended to handle censored, truncated, or multivariate response structures:

  • Censored Data (Survival-CRPS): For survival analysis, CRPS is generalized to "Survival-CRPS," addressing right- and interval-censored data by adapting the integral to penalize probability mass only where observations are not censored. Survival-CRPS is less sensitive to outliers than the negative log-likelihood (MLE/loss), leading to sharper distributions and maintaining calibration (Avati et al., 2018).
  • Multivariate Settings: The univariate CRPS is not directly sensitive to correlation across dimensions. Recent innovations address this through "Conditional CRPS" (CCRPS), which sums univariate CRPS terms for the marginals and conditionals, explicitly penalizing errors in dependency structure and admitting closed-form expressions for common distributions such as the multivariate normal (Roordink et al., 22 Sep 2024). "MVG-CRPS" extends the CRPS to the multivariate Gaussian by whitening the covariance and applying the univariate CRPS in the transformed domain, improving robustness and accuracy compared to the log-score and Energy Score (Zheng et al., 11 Oct 2024).
  • Weighted and Targeted CRPS: Weighted CRPS variants focus on specific regions of the outcome space (e.g., heaviness of tails, excursions above thresholds). These are used to target regions of interest in sequential design and model evaluation, with threshold-weighted CRPS acquisition functions enabling targeted learning in goal-oriented Gaussian process (GP) experimentation (Friedli et al., 14 Mar 2025), and weighted CRPS providing limited discrimination for extremes unless tail equivalence is assured (Taillardat et al., 2019).

5. CRPS in Forecast Combination and Online Learning

CRPS's mixability property enables powerful and adaptive forecast aggregation algorithms, especially within the "prediction with expert advice" (PEA) framework:

  • Mixability and Regret Bounds: CRPS is mixable with parameter η=2/(ba)\eta = 2/(b - a) (where [a,b][a,b] is the prediction interval), allowing Vovk's Aggregating Algorithm (AA) to achieve time-independent regret bounds: after TT rounds, the learner's cumulative CRPS is at most that of the best expert plus a term logarithmic in the number of experts and independent of TT (V'yugin et al., 2019, V'yugin et al., 2021). The explicit aggregation rule combines CDF forecasts at each point via

F(u)=1214log(iqiexp(2Fi(u)2)iqiexp(2(1Fi(u))2)).F(u) = \frac{1}{2} - \frac{1}{4} \log\left( \frac{\sum_i q_i \exp(-2 F_i(u)^2)}{\sum_i q_i \exp(-2 (1 - F_i(u))^2)} \right).

  • CRPS Learning and Quantile Aggregation: Modern ensemble forecasting techniques apply CRPS minimization by adapting weights not only across time but also across quantile levels ("horizontal" aggregation), often using spline-based smoothing or penalization to regularize the weight structure (Berrisch et al., 2021, Berrisch et al., 2023). This allows the ensemble to adaptively "trust" different models in different regions (e.g., center vs. tails) of the distribution.
  • Empirical Results and Practicality: Empirical studies on day-ahead electricity price forecasting show significant CRPS improvement from learning quantile- and margin-specific weights (Berrisch et al., 2023). Nevertheless, gains in CRPS may not always translate directly to better operational decision-making, as improved sharpness or overall CRPS can be decoupled from performance at extreme quantiles relevant to utility or profit (Nitka et al., 2023).

6. Computational Aspects and Robustness

Accurate and efficient CRPS estimation is essential. Common estimators include quantile-based and probability-weighted moment (PWM) approaches, but both suffer from irreducible bias in the presence of nonlinear functionals of the empirical CDF. Recent work proposes an unbiased estimator leveraging symmetric bivariate forms and interprets CRPS estimation as a kernel quadrature problem in RKHS, using cubature to achieve compression and computational scalability while eliminating estimation bias (Adachi et al., 8 Mar 2025). This estimator preserves correct model rankings even when CRPS values are very close, addressing a notable practical limitation of prior approaches.

Robustness considerations have led to new variants such as the scaled CRPS (SCRPS), which normalizes the score by the predictive spread, achieving local scale invariance—a property not present in the standard CRPS—thereby avoiding unintuitive rankings when predictive uncertainty varies substantially across instances (Bolin et al., 2019). Truncated or robust CRPS modifications have also appeared, mitigating hypersensitivity to outliers.

7. Applications and Contemporary Developments

CRPS is now the default evaluation metric in meteorological postprocessing, distributional machine learning, and ensemble forecasting. It underpins the calibration and validation of deep learning weather forecasting systems, making possible fair “potential skill” comparisons between deterministic AI-based and NWP weather models through isotonic distributional regression and the "potential CRPS" (PC) (Gneiting et al., 4 Jun 2025). In advanced scientific applications, CRPS-guided targeted sequential design outperforms conventional acquisition strategies, lending increased efficiency to data collection in areas such as computational chemistry (Friedli et al., 14 Mar 2025).

Recent advances incorporate ensemble size corrections (e.g., fair and almost fair CRPS) for training neural ensemble weather systems (Lang et al., 20 Dec 2024), novel multivariate generalizations that directly optimize correlation structure (Roordink et al., 22 Sep 2024, Zheng et al., 11 Oct 2024), and more interpretable decompositions revealing which forecasting systems are miscalibrated or lacking in discrimination (Arnold et al., 2023).

CRPS thus remains central to both theoretic understanding and the practical deployment of probabilistic forecasting models, catalyzing model innovation, evaluation, and selection in a wide array of research and operational contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)