Sequential Predictive Scoring Rules

Updated 23 October 2025

The Predictive-Sequential Scoring Rule is a framework that evaluates sequential probabilistic forecasts using proper scoring rules based on density values and derivatives.
It employs a tangent construction with variational calculus to derive order-2 local scoring rules, ensuring precise and robust model comparison.
Applications in weather forecasting demonstrate enhanced diagnostic ability and online recalibration, offering resilience to outlier effects.

A predictive-sequential scoring rule is a functional framework designed to evaluate, incentivize, and sometimes train probabilistic forecasts when predictions are made sequentially, particularly in settings where model assessments must be updated efficiently as new observations arrive. The distinctive aspect is that each sequential forecast is scored individually, typically using a proper scoring rule, and then the aggregate over time forms the evaluation criterion. Local proper scoring rules—specifically those of order two—offer advantageous features for these applications, especially when the forecast distribution is known only through its density and derivatives in the neighborhood of the observed outcome (Ehm et al., 2011).

1. Formal Definition and Characterization

A scoring rule $S$ assigns a loss to the forecaster depending on the announced predictive density $q$ and the realized outcome $x$ . $S$ is proper if its expected value is minimized when the announced predictive density coincides with the true distribution $p$ , that is,

$\mathbb{E}_p[ S(\cdot, Q) ] \geq \mathbb{E}_p [ S(\cdot, P) ]$

for all predictive densities $q$ in the relevant function class.

A scoring rule is local of order $k$ if $S(x, q)$ depends on $q$ only through its value and its derivatives up to $k$ at $x$ :

$S(x, q) = s \left(x, \ln q(x), (\ln q(x))', \ldots, (\ln q(x))^{(k)} \right).$

For order $k=2$ , this includes information on local curvature, supplying sensitivity to features such as peakedness, which is not accessed by order-zero (e.g., logarithmic) scores. The Hyvärinen score is the canonical example:

$S(x, q) = ((\ln q(x))')^2 + 2 (\ln q(x))''.$

These local rules are contrasted with nonlocal rules, which depend on integrals or the full global structure of $q$ .

2. Methodological Foundations: Tangent Construction and Variational Calculus

The paper establishes that all local proper scoring rules of order 2 can be constructed via tangent functionals of concave functionals defined on the space of densities:

$S(x, q) = G(x, q) - \int G(\cdot, q) q + \Phi(q),$

where $G$ is a (super)gradient of the concave functional $\Phi$ . By fixing the order-2 locality constraint, the scoring rule can always be written in terms of a kernel representation:

$K(x, z_0, z_1) = c z_0 + K_0(x, z_1),$

with $z_0 = \ln q(x)$ and $z_1 = (\ln q(x))'$ . The necessary and sufficient structure is derived via an Euler–Lagrange equation:

$S(x, q) = c \ln q(x) + \big[1 - z_1 \partial_{z_1} - \partial_x \partial_{z_1}\big] K_0(x, z_1)$

for suitable constant $c$ and function $K_0$ . This “tangent construction” is both necessary and sufficient for order-2 locality and properness, and the operation is fundamentally a fixed-point operation for such scoring rules.

3. Sequential Evaluation and Aggregation

In the predictive-sequential setting, a new forecast $q_t$ is issued at each time $t$ , and the realized $x_t$ is observed. The sequential score is computed as the sum or average over $T$ observations:

$S_\text{seq} = \frac{1}{T} \sum_{t=1}^T S(x_t, q_t).$

Local proper scoring rules are especially suited for this setup, as the evaluation at each time point requires only the knowledge of $q_t(x_t)$ and its first two derivatives at $x_t$ . This supports rapid recalibration and computational efficiency for online or real-time applications.

Moreover, such locality confers robustness: by selecting kernels $K_0$ that “ignore” the tails or far-from-realization behavior of $q_t$ , the score can exhibit reduced sensitivity to model misspecification outside the region of interest. The tangent construction ensures that regardless of the functional form chosen (within the allowable class), properness is maintained, ensuring truthful reporting at every sequential step.

4. Practical Applications: Weather Forecast Verification

Applied experiments in the paper test both local and nonlocal scoring rules to assess postprocessed ensemble weather forecasts, specifically those calibrated using Bayesian model averaging (BMA) and ensemble model output statistics (EMOS). Evaluated metrics include the logarithmic (LS), Hyvärinen (HS), and log cosh (LCS) scores, in addition to traditional quadratic and spherical scores.

Findings are as follows.

Mean scores computed over large test periods consistently show that postprocessed forecasts (BMA/EMOS) outperform the raw ensemble.
Local rules (HS, LCS) reach similar qualitative conclusions as nonlocal rules, but provide additional diagnostic value, e.g., differentiating forecast distributions by their local curvature features.
Robust local scores such as LCS can add protection against outliers in the observations.

This demonstrates that local proper scoring rules—especially those of order 2—are effective for operational, sequential forecast verification and model comparison in meteorological settings.

5. Comparative Properties and Implementation Considerations

Order-2 local proper scoring rules, notably the Hyvärinen score, are uniquely advantageous for situations in which the predictive density is only known up to a normalizing constant. This property is especially useful in probabilistic models with computationally intractable normalizations, allowing for evaluation of forecast quality without requiring full normalization.

The implementation considerations are as follows.

Only smoothness and positivity of $q_t$ (and differentiability for derivatives up to the required order) are required at the observed $x_t$ . For many forecast systems (e.g., normal or exponential families in BMA/EMOS), these conditions hold globally.
The computational cost is modest: given a closed-form $q_t(x)$ , both the first and second derivatives can be evaluated symbolically or via automatic differentiation at each $x_t$ .
For robust sequential operations, the kernel $K_0$ can be tailored to yield desired insensitivity to, for instance, outlier values, enabling robust scoring in the sequential context.

A potential limitation is that the theoretical guarantees rely on the differentiability and positivity of $q(x)$ in the neighborhood of each new realized $x$ ; for degenerate, discrete, or heavily censored predictive distributions, the methodology may require adaptation.

6. Extensions and Implications for Predictive–Sequential Scoring

Although the technical contribution is the characterization for real-line densities, the methods and results extend naturally to the sequential scoring of probabilistic systems. The tangent construction, the focus on locality, and the aggregation of local scores form a rigorous basis for designing predictive-sequential scoring rules with inherent advantages:

Fast, online recalibration capabilities, as each score is evaluated independently with only local information.
Flexibility to encode robustness properties via the kernel $K_0$ in the tangent construction.
The ability to aggregate scores over time windows, yielding cumulative evaluations necessary for sequential model assessment, change-point detection, or adaptive forecast improvement in time series.

The architectural clarity and mathematical sufficiency of the tangent construction for all order-2 local proper scores ensures that practitioners can design new proper scoring rules for specific sequential applications, while retaining essential properties such as strict propriety and computational efficiency.

7. Summary Table: Properties of Local Proper Scoring Rules Compared

Score Type	Locality Order	Information Used	Robustness to Normalization	Diagnostic Utility
Logarithmic	0	$q(x)$	No	Penalizes misestimation only at $x$
Hyvärinen	2	$q(x), q'(x), q''(x)$	Yes	Captures curvature, peakedness
Quadratic/Spherical	Nonlocal	$\int q(\cdot)$	No	Sensitive to global shape

Local proper scoring rules of order two provide a unifying and tractable approach for predictive-sequential scoring, with broad utility for both forecast verification and model calibration in sequential predictive settings (Ehm et al., 2011).

PDF Markdown Chat (Pro)

References (1)

Local proper scoring rules of order two (2011)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Predictive-Sequential Scoring Rule.