Balanced DRPS for Ordinal Regression

Updated 6 March 2026

Balanced DRPS is a performance metric for ordinal regression that incorporates class imbalance correction by weighting each sample inversely to its class frequency.
It computes a weighted square error between the cumulative predicted probabilities and the true cumulative distribution, ensuring fair evaluation across ordered classes.
This metric enhances model assessment in tasks like Question Difficulty Estimation by penalizing predictions based on their ordinal distance from the true class.

The balanced Discrete Ranked Probability Score (Balanced DRPS) is a performance metric for probabilistic prediction in discrete ordinal regression tasks, explicitly designed to address both the ordinal structure of the problem and class imbalance. It is formulated to provide fair and distance-sensitive evaluation across all classes, particularly in settings where the label distribution is skewed. Balanced DRPS is especially suited for tasks such as Question Difficulty Estimation (QDE), where difficulty levels are discrete, ordered, and often imbalanced in authentic datasets (Thuy et al., 1 Jul 2025).

1. Mathematical Formulation

Balanced DRPS is defined as a re-weighted variant of the discrete ranked probability score (DRPS), where the weight assigned to each sample is inversely proportional to the frequency of its true class label. Let $D = \{ (\mathbf{x}_i, y_i) \}_{i=1}^N$ be a dataset of $N$ observations with $K$ ordered classes $1 \prec 2 \prec \cdots \prec K$ . For a probabilistic forecast, define

$F_k(\hat y_i) = \sum_{j=1}^k P(y_i = j \mid \mathbf{x}_i)$

for each $k = 1, ..., K-1$ . The standard DRPS is

$\mathrm{DRPS}(F, y) = \frac{1}{N} \sum_{i=1}^N \sum_{k=1}^{K-1} \bigl(F_k(\hat y_i) - \mathds{1}\{k \geq y_i\}\bigr)^2$

where $\mathds{1}\{k \geq y_i\}$ is the step function indicating the true class position. To account for class imbalance, define the per-sample weight as

$w_i = \frac{1}{\sum_{j=1}^N \mathds{1}\{y_j = y_i\}}$

yielding the balanced DRPS:

$\mathrm{Balanced\;DRPS}(F, y) = \frac{1}{N} \sum_{i=1}^N \sum_{k=1}^{K-1} w_i \bigl(F_k(\hat y_i) - \mathds{1}\{k \geq y_i\}\bigr)^2$

When class frequencies are uniform, $w_i$ is constant and Balanced DRPS coincides with standard DRPS (Thuy et al., 1 Jul 2025).

2. Motivation and Rationale

Standard DRPS, while distance-sensitive, is sensitive to class imbalance: errors on frequent classes dominate the aggregate score, resulting in biased performance estimates. Balanced DRPS rectifies this by weighting each sample by the inverse of its class frequency, so that each class contributes equally to the final metric regardless of its frequency in the dataset. This ensures that performance on minority (often more difficult or critical) classes is not overshadowed by majority classes, which is a key concern in QDE and similar ordinal regression settings (Thuy et al., 1 Jul 2025).

3. Score Components and Interpretation

Each term of the Balanced DRPS evaluates the squared difference between the predicted cumulative distribution and the true (degenerate) CDF at each threshold:

$F_k(\hat y_i)$ : Cumulative probability assigned to classes up to $k$ by the model.
$\mathds{1}\{k \ge y_i\}$: Ground-truth CDF (0 for $k < y_i$ , 1 for $k \ge y_i$ ).
$w_i$ : Class-balancing weight, calibrated so that total contribution per class across samples equals one.

The outer average over $N$ samples ensures comparability across datasets and experiments. This design penalizes predictions proportionally to their distance from the true class, reflecting the ordinal aspect and distributing aggregate loss evenly over all label values, regardless of observed frequency (Thuy et al., 1 Jul 2025).

4. Theoretical and Practical Properties

Balanced DRPS exhibits several desirable characteristics:

Ordinal error penalization: Predictions farther from the true class incur greater penalties, with the penalty accumulating over cumulative thresholds.
Probabilistic output sensitivity: The metric leverages the full forecasted probability distribution, handling “soft” outputs as well as point predictions. For deterministic predictions, it reduces to the mean absolute error with class-balancing weights.
Class imbalance correction: Each label’s total influence on the score is normalized, ensuring fairness under skewed distributions (e.g., when extreme difficulty levels are rare compared to mid-range levels).
Training-objective independence: The metric is neutral to the specific modeling paradigm or optimization loss, supporting cross-study and cross-model comparison (Thuy et al., 1 Jul 2025).

5. Implementation Protocol

Computation of Balanced DRPS proceeds as follows:

Compute the frequency of each class $\mathrm{freq}[c]$ .
For each sample $i$ , set $w_i = 1.0 / \mathrm{freq}[y_i]$ .
For each sample, compute the cumulative probabilities $F_1, \ldots, F_{K-1}$ from the model’s predicted probabilities.
Construct the true CDF vector for thresholding:

$\text{trueCDF}[k] = \mathds{1}\{k \geq y_i\}, \quad k=1,\ldots,K-1$

For each sample, sum the weighted squared threshold errors, then aggregate across all samples and normalize by $N$ .
The resulting score quantifies balanced, distance-aware ordinal prediction error (Thuy et al., 1 Jul 2025).

6. Empirical Evaluation and Use Cases

Balanced DRPS has been validated for benchmarking models in Question Difficulty Estimation tasks, particularly on RACE++ and ARC datasets, both displaying substantial class imbalance:

In comparative experiments, Balanced DRPS is reported for probabilistic and deterministic model outputs, demonstrating its utility in distinguishing performance differences that conventional accuracy or RMSE may obscure.
Figures in the source paper illustrate that predictions with mass closer to the true class (even if distributed) are rewarded under Balanced DRPS compared to over-confident but miscentered predictions.
The metric is particularly sensitive to ordinal misclassification and exposes nuanced behaviors in neural ordinal regression architectures (such as OrderedLogitNN) as opposed to flat classification or regression approaches (Thuy et al., 1 Jul 2025).

7. Comparative Assessment with Alternative Metrics

Balanced DRPS possesses advantages over standard accuracy, adjacent-accuracy, and RMSE metrics:

Metric	Ordinal-Aware	Class Imbalance Correction	Probabilistic Output Support
Accuracy	No	No	No
RMSE	Linear	No	Depends (regression)
Standard DRPS	Yes	No	Yes
Balanced DRPS	Yes	Yes	Yes

Conventional metrics either neglect the ordinal structure or overemphasize more frequent classes. Balanced DRPS is threshold-free, leverages the full prediction distribution, and its class balancing ensures fair representation of all classes, aligning the evaluation objective more closely with the goals of ordinal tasks under real-world label skew (Thuy et al., 1 Jul 2025).

References

"Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN" (Thuy et al., 1 Jul 2025)

Markdown Report Issue Upgrade to Chat

References (1)

Ordinality in Discrete-level Question Difficulty Estimation: Introducing Balanced DRPS and OrderedLogitNN (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Balanced Discrete Ranked Probability Score (Balanced DRPS).

Balanced DRPS for Ordinal Regression

1. Mathematical Formulation

2. Motivation and Rationale

3. Score Components and Interpretation

4. Theoretical and Practical Properties

5. Implementation Protocol

6. Empirical Evaluation and Use Cases

7. Comparative Assessment with Alternative Metrics

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Balanced DRPS for Ordinal Regression

1. Mathematical Formulation

2. Motivation and Rationale

3. Score Components and Interpretation

4. Theoretical and Practical Properties

5. Implementation Protocol

6. Empirical Evaluation and Use Cases

7. Comparative Assessment with Alternative Metrics

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research