Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Sequential Predictive Scoring Rules

Updated 23 October 2025
  • The Predictive-Sequential Scoring Rule is a framework that evaluates sequential probabilistic forecasts using proper scoring rules based on density values and derivatives.
  • It employs a tangent construction with variational calculus to derive order-2 local scoring rules, ensuring precise and robust model comparison.
  • Applications in weather forecasting demonstrate enhanced diagnostic ability and online recalibration, offering resilience to outlier effects.

A predictive-sequential scoring rule is a functional framework designed to evaluate, incentivize, and sometimes train probabilistic forecasts when predictions are made sequentially, particularly in settings where model assessments must be updated efficiently as new observations arrive. The distinctive aspect is that each sequential forecast is scored individually, typically using a proper scoring rule, and then the aggregate over time forms the evaluation criterion. Local proper scoring rules—specifically those of order two—offer advantageous features for these applications, especially when the forecast distribution is known only through its density and derivatives in the neighborhood of the observed outcome (Ehm et al., 2011).

1. Formal Definition and Characterization

A scoring rule SS assigns a loss to the forecaster depending on the announced predictive density qq and the realized outcome xx. SS is proper if its expected value is minimized when the announced predictive density coincides with the true distribution pp, that is,

Ep[S(,Q)]Ep[S(,P)]\mathbb{E}_p[ S(\cdot, Q) ] \geq \mathbb{E}_p [ S(\cdot, P) ]

for all predictive densities qq in the relevant function class.

A scoring rule is local of order kk if S(x,q)S(x, q) depends on qq only through its value and its derivatives up to kk at xx:

S(x,q)=s(x,lnq(x),(lnq(x)),,(lnq(x))(k)).S(x, q) = s \left(x, \ln q(x), (\ln q(x))', \ldots, (\ln q(x))^{(k)} \right).

For order k=2k=2, this includes information on local curvature, supplying sensitivity to features such as peakedness, which is not accessed by order-zero (e.g., logarithmic) scores. The Hyvärinen score is the canonical example:

S(x,q)=((lnq(x)))2+2(lnq(x)).S(x, q) = ((\ln q(x))')^2 + 2 (\ln q(x))''.

These local rules are contrasted with nonlocal rules, which depend on integrals or the full global structure of qq.

2. Methodological Foundations: Tangent Construction and Variational Calculus

The paper establishes that all local proper scoring rules of order 2 can be constructed via tangent functionals of concave functionals defined on the space of densities:

S(x,q)=G(x,q)G(,q)q+Φ(q),S(x, q) = G(x, q) - \int G(\cdot, q) q + \Phi(q),

where GG is a (super)gradient of the concave functional Φ\Phi. By fixing the order-2 locality constraint, the scoring rule can always be written in terms of a kernel representation:

K(x,z0,z1)=cz0+K0(x,z1),K(x, z_0, z_1) = c z_0 + K_0(x, z_1),

with z0=lnq(x)z_0 = \ln q(x) and z1=(lnq(x))z_1 = (\ln q(x))'. The necessary and sufficient structure is derived via an Euler–Lagrange equation:

S(x,q)=clnq(x)+[1z1z1xz1]K0(x,z1)S(x, q) = c \ln q(x) + \big[1 - z_1 \partial_{z_1} - \partial_x \partial_{z_1}\big] K_0(x, z_1)

for suitable constant cc and function K0K_0. This “tangent construction” is both necessary and sufficient for order-2 locality and properness, and the operation is fundamentally a fixed-point operation for such scoring rules.

3. Sequential Evaluation and Aggregation

In the predictive-sequential setting, a new forecast qtq_t is issued at each time tt, and the realized xtx_t is observed. The sequential score is computed as the sum or average over TT observations:

Sseq=1Tt=1TS(xt,qt).S_\text{seq} = \frac{1}{T} \sum_{t=1}^T S(x_t, q_t).

Local proper scoring rules are especially suited for this setup, as the evaluation at each time point requires only the knowledge of qt(xt)q_t(x_t) and its first two derivatives at xtx_t. This supports rapid recalibration and computational efficiency for online or real-time applications.

Moreover, such locality confers robustness: by selecting kernels K0K_0 that “ignore” the tails or far-from-realization behavior of qtq_t, the score can exhibit reduced sensitivity to model misspecification outside the region of interest. The tangent construction ensures that regardless of the functional form chosen (within the allowable class), properness is maintained, ensuring truthful reporting at every sequential step.

4. Practical Applications: Weather Forecast Verification

Applied experiments in the paper test both local and nonlocal scoring rules to assess postprocessed ensemble weather forecasts, specifically those calibrated using Bayesian model averaging (BMA) and ensemble model output statistics (EMOS). Evaluated metrics include the logarithmic (LS), Hyvärinen (HS), and log cosh (LCS) scores, in addition to traditional quadratic and spherical scores.

Findings are as follows.

  • Mean scores computed over large test periods consistently show that postprocessed forecasts (BMA/EMOS) outperform the raw ensemble.
  • Local rules (HS, LCS) reach similar qualitative conclusions as nonlocal rules, but provide additional diagnostic value, e.g., differentiating forecast distributions by their local curvature features.
  • Robust local scores such as LCS can add protection against outliers in the observations.

This demonstrates that local proper scoring rules—especially those of order 2—are effective for operational, sequential forecast verification and model comparison in meteorological settings.

5. Comparative Properties and Implementation Considerations

Order-2 local proper scoring rules, notably the Hyvärinen score, are uniquely advantageous for situations in which the predictive density is only known up to a normalizing constant. This property is especially useful in probabilistic models with computationally intractable normalizations, allowing for evaluation of forecast quality without requiring full normalization.

The implementation considerations are as follows.

  • Only smoothness and positivity of qtq_t (and differentiability for derivatives up to the required order) are required at the observed xtx_t. For many forecast systems (e.g., normal or exponential families in BMA/EMOS), these conditions hold globally.
  • The computational cost is modest: given a closed-form qt(x)q_t(x), both the first and second derivatives can be evaluated symbolically or via automatic differentiation at each xtx_t.
  • For robust sequential operations, the kernel K0K_0 can be tailored to yield desired insensitivity to, for instance, outlier values, enabling robust scoring in the sequential context.

A potential limitation is that the theoretical guarantees rely on the differentiability and positivity of q(x)q(x) in the neighborhood of each new realized xx; for degenerate, discrete, or heavily censored predictive distributions, the methodology may require adaptation.

6. Extensions and Implications for Predictive–Sequential Scoring

Although the technical contribution is the characterization for real-line densities, the methods and results extend naturally to the sequential scoring of probabilistic systems. The tangent construction, the focus on locality, and the aggregation of local scores form a rigorous basis for designing predictive-sequential scoring rules with inherent advantages:

  • Fast, online recalibration capabilities, as each score is evaluated independently with only local information.
  • Flexibility to encode robustness properties via the kernel K0K_0 in the tangent construction.
  • The ability to aggregate scores over time windows, yielding cumulative evaluations necessary for sequential model assessment, change-point detection, or adaptive forecast improvement in time series.

The architectural clarity and mathematical sufficiency of the tangent construction for all order-2 local proper scores ensures that practitioners can design new proper scoring rules for specific sequential applications, while retaining essential properties such as strict propriety and computational efficiency.

7. Summary Table: Properties of Local Proper Scoring Rules Compared

Score Type Locality Order Information Used Robustness to Normalization Diagnostic Utility
Logarithmic 0 q(x)q(x) No Penalizes misestimation only at xx
Hyvärinen 2 q(x),q(x),q(x)q(x), q'(x), q''(x) Yes Captures curvature, peakedness
Quadratic/Spherical Nonlocal q()\int q(\cdot) No Sensitive to global shape

Local proper scoring rules of order two provide a unifying and tractable approach for predictive-sequential scoring, with broad utility for both forecast verification and model calibration in sequential predictive settings (Ehm et al., 2011).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Predictive-Sequential Scoring Rule.