Censored and Shifted Gamma (CSG) Distribution
- The CSG distribution is a parametric family used to model non-negative mixed outcomes, particularly 24-hour precipitation accumulations.
- It applies a censored and shifted gamma framework to represent both the probability of zero rainfall and the distribution of positive amounts.
- Parameter estimation via closed-form CRPS optimization and semi-local clustering enhances ensemble forecast calibration with statistical rigor.
The censored and shifted gamma (CSG) distribution is a parametric family designed to model non-negative, mixed discrete–continuous outcomes, most notably 24-hour precipitation accumulations. In operational ensemble forecast post-processing, the CSG distribution forms the core of a widely adopted ensemble model output statistics (EMOS) approach that enables direct calibration for both the probability of zero precipitation and the distribution of positive amounts. The CSG model delivers significant improvements in forecast skill and calibration, especially under operational constraints such as limited training data or the coexistence of dual-resolution ensemble sources (Szabó et al., 2022, Baran et al., 2015).
1. Mathematical Definition and Properties
A CSG random variable is defined as the maximum of zero and a shifted gamma random variable: , where , and .
Probability Functions
Let denote the gamma density and its cumulative distribution function (CDF):
The CDF and PDF of are: $F_X(x) = \begin{cases} 0 & x < 0 \[6pt] G_{\alpha,\beta}(x + \delta) & x \geq 0 \end{cases}$
0
Here, 1 is the point mass at zero precipitation, while for 2 the density is a shifted gamma, left-censored at zero.
Mean and Variance
Letting 3: 4 The second moment has analogous structure using incomplete-gamma identities; normalization and closed-form evaluation are ensured by construction (Baran et al., 2015).
2. Linking CSG Parameters to Ensemble Forecasts
Within the EMOS framework, the CSG's underlying gamma mean (5) and variance (6) are modeled as functions of the ensemble members. The canonical regression links are: 7 where 8 are ensemble member forecasts, 9 is the ensemble mean, and all coefficients are non-negative. The CSG parameters are recovered as: 0 The shift 1 is a further non-negative parameter. In the presence of exchangeable groups (e.g., high- and low-resolution ensemble subsets), group means replace individual member forecasts in the regression linkage.
In dual-resolution settings, the mean links as: 2 where “H” and “L” index high- and low-resolution groups (Szabó et al., 2022).
3. Parameter Estimation via Proper Scoring Rules
CSG EMOS parameters 3 are estimated by minimizing the mean Continuous Ranked Probability Score (CRPS) over a rolling training set: 4 A closed-form expression of CRPS for the CSG distribution exists, utilizing incomplete-gamma functions, and is used to enable efficient direct numerical optimization (e.g., L-BFGS-B). This method is preferred to maximum likelihood estimation in typical operational settings, as it yields superior probabilistic performance. All parameters are box-constrained to maintain physical admissibility (non-negativity) (Baran et al., 2015, Szabó et al., 2022).
4. Semi-local Training and Clustering
To achieve a balance between spatial localization and statistical robustness, CSG EMOS commonly adopts a semi-local training strategy:
- For each forecast initialization, a rolling 30-day training window is used.
- Each land grid point is characterized by a 24-dimensional feature vector combining quantiles of both the climatological precipitation CDF and the recent raw-ensemble-mean error distribution.
- K-means clustering (with 5) is applied to group grid points with similar climatology and error characteristics.
- Parameter estimation for each cluster aggregates data from all its points (typically 1000–1500 cases), enabling more stable and regionally adaptive calibration.
This approach maintains parameter locality while pooling data to mitigate the sample size limitations inherent to short rolling training windows (Szabó et al., 2022).
5. Verification and Empirical Findings
CSG EMOS has been rigorously evaluated in operational and research contexts, notably on European dual-resolution ECMWF ensembles and regional ensemble systems.
Verification Metrics
- Mean CRPS and CRPSS (skill score relative to raw ensemble)
- Brier scores (BS) and skill scores (BSS) for preset thresholds (e.g., 0.1, 5, 10 mm)
- Reliability diagrams
- Statistical significance assessed via block-bootstrap and Diebold–Mariano tests with false-discovery-rate correction
Main Results
- Raw dual-resolution ensembles exhibit under-dispersion and bias, with skill disparities apparent up to day 5.
- CSG EMOS post-processing yields statistically significant CRPS reduction across all lead times (e.g., CRPSS ≈ 0.15 at day 1, ≈ 0.05 at day 5).
- After CSG EMOS calibration, inter-configuration skill differences between dual-resolution mixtures are statistically insignificant.
- Compared with quantile mapping (QM) and weighted QM—both requiring extensive historical reforecast data—CSG EMOS, trained on only 30 days of data, matches or slightly outperforms these alternatives in mean CRPS and Brier score at all time horizons.
- Reliability is restored: at 0.1 mm thresholds, calibration is near-perfect even at lead time day 10; improvement is also observed at heavier rainfall thresholds despite data sparsity.
6. Distinctive Attributes and Operational Impact
Key advantages of the CSG EMOS framework in operational post-processing include:
- Explicit probability mass at zero precipitation without recourse to mixture models.
- Calibration and adjustment of ensemble forecast bias and dispersion through linear regression on ensemble statistics, not relying on discrete–continuous mixtures or additional covariates.
- Unified parametric structure, reducing implementation complexity and fit instability.
- Closed-form CRPS for rapid and robust practitioner adoption.
- In comparative field tests, CSG EMOS demonstrates sharper, more calibrated forecasts and more accurate point predictions than both censored GEV EMOS and gamma BMA, particularly when considering forecast reliability and sharpness jointly.
These attributes render CSG EMOS a practical, computationally efficient, and statistically rigorous post-processing solution in contemporary ensemble forecast calibration, particularly advantageous in environments with limited historical reforecast archives or under dual-resolution computational constraints (Baran et al., 2015, Szabó et al., 2022).