Papers
Topics
Authors
Recent
Search
2000 character limit reached

Early Risk Detection Error (ERDE)

Updated 29 June 2026
  • ERDE is a time-aware metric that defines risk detection performance by measuring both the accuracy and timeliness of decisions on sequential data.
  • It integrates penalties for false positives, false negatives, and delayed true positives using customizable thresholds and cost functions.
  • ERDE has become a standard in evaluating early risk detection models, influencing system design in domains like mental health and safety monitoring.

Early Risk Detection Error (ERDE) is a class of time-aware evaluation metrics explicitly designed to quantify both the accuracy and promptness of automated risk detection models when predicting adverse outcomes (e.g., mental health conditions) from temporal data streams such as social media posts or behavioral logs. ERDE emerged as the standard metric in the CLEF eRisk shared tasks and has since been widely adopted and generalized across sequential risk modeling domains, balancing penalties for late detection, incorrect detection, and missed detection in a single, parameterized framework (Burdisso et al., 2019, Trotzek et al., 2018, Bucur et al., 2021, Thompson et al., 2024, Thompson et al., 16 May 2025, Farooque et al., 22 May 2026).

1. Formal Definition and Variants of ERDE

The canonical ERDE metric (also denoted ERDEo_o or ERDEθ_\theta depending on notation) assigns a per-user error defined by three components: (a) correctness of the detection (true/false positive/negative cases), (b) timeliness of positive detection relative to a deadline or grace parameter oo (θ\theta), and (c) application-specific unit costs for each case. The most common sigmoid-based form, as used in CLEF eRisk, is given by:

ERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}

where

c(k)=111+exp(ko)\ell_c(k) = 1 - \frac{1}{1 + \exp(k - o)}

with kk the index at which a decision is made and oo the deadline parameter, typically set to $5$ or $50$ (number of posts/chunks) (Burdisso et al., 2019, Trotzek et al., 2018, Bucur et al., 2021). Unit costs θ_\theta0 are usually set to θ_\theta1, but can be adjusted to reflect domain-specific trade-offs.

Linear and piecewise-linear variants exist, notably in recent BERT-based and synthetic benchmark evaluations, expressing delay penalty as θ_\theta2 (for detection time θ_\theta3) or as θ_\theta4 for integer cutoff θ_\theta5 (Thompson et al., 2024, Bucur et al., 2021, Farooque et al., 22 May 2026).

ERDEθ_\theta6 replaces the absolute count θ_\theta7 with a percentage of the user’s total data available, addressing biases in users with heterogeneous verbosity:

θ_\theta8

where θ_\theta9 and oo0 is expressed as a percent threshold (Trotzek et al., 2018).

2. Intuitive Interpretation and Motivation

ERDE integrates three operational objectives:

  • Promptness: Early, correct risk detection ("true positive" before the deadline) yields minimal or zero penalty; late detection is penalized increasingly as delay grows past oo1.
  • Specificity: Any false positive (flag on a negative case) incurs a maximal penalty.
  • Missed Detection: Failing to ever raise a positive decision for a true case (false negative) also receives the maximal penalty.

The latency cost oo2 ensures that a correct prediction is not sufficient unless issued early; correctness is modulated by when the prediction is made. The deadline parameter oo3 encodes task-specific tolerance for evidence accumulation before full penalty is imposed: smaller oo4 enforces stricter earliness, while larger oo5 allows more leeway before delay costs are triggered (Burdisso et al., 2019, Thompson et al., 2024, Thompson et al., 16 May 2025).

3. Implementation Protocols and Evaluation Practice

In eRisk protocols and recent longitudinal evaluation frameworks (e.g., Cogniscope), a subject’s data is split into fixed-size temporal units (e.g., 10 "chunks" of posts per user). The system processes each unit sequentially, required to issue a binary decision (risk/no-risk) per subject, after which no further data from that user is ingested. ERDE is computed as the average per-user error across the test set:

  • For CLEF eRisk, ERDEoo6 and ERDEoo7 are computed by setting oo8 or oo9, with the final score being the mean across users (Burdisso et al., 2019, Bucur et al., 2021).
  • In longitudinal benchmarks such as Cogniscope, for true positives, the penalty for late detection is linear with respect to the onset day and user-level grace window (Farooque et al., 22 May 2026):

θ\theta0

where θ\theta1 is the first time the system alarmed, θ\theta2 is ground-truth onset, and θ\theta3 is the penalty window.

4. Comparative Analysis and Empirical Results

Empirical analyses demonstrate that enhancements in temporal modeling and context representation yield improved ERDE scores:

  • τ-SS3—a text classifier integrating dynamic n-grams—achieves lower ERDEθ\theta4 compared to bag-of-words baselines on early depression/anorexia detection (Burdisso et al., 2019). For example, ERDEθ\theta5 dropped from 8.12% (SS3) to 7.70% (τ-SS3) for eRisk 2017 depression, and to 6.17% on eRisk 2018 depression, setting state-of-the-art results.
  • In benchmarks, transformer-based and time-aware models achieving earlier correct decisions consistently report lower ERDE than late-firing or conservatively thresholded models, even when raw F1 is similar (Thompson et al., 16 May 2025, Thompson et al., 2024, Bucur et al., 2021).
  • Use of ERDEθ\theta6 aligns system ranking more closely with intuitive early-detection behavior, especially when user post counts vary widely (Trotzek et al., 2018).

Notably, ERDE highlights the inherent trade-off: systems making aggressive early alarms risk high false positive penalty, while overly conservative systems incur steep delay or miss penalties.

5. Limitations, Modifications, and Ongoing Controversies

Critiques of ERDE focus on several systematic limitations:

  • Deadline/Parameter Sensitivity: The deadline θ\theta7 is task and dataset-dependent, requiring external calibration; varying θ\theta8 can substantially alter relative model ranking (Burdisso et al., 2019, Thompson et al., 2024).
  • Discrete Chunks vs. Proportional Data: Original formulations penalize by count (θ\theta9), leading to unfair assessments across users with heterogeneous data lengths. Proportional versions (ERDEERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}0) address this (Trotzek et al., 2018).
  • Unit Cost Uniformity: In most evaluations ERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}1, but domain mismatch between real-world consequences and these weights is noted. Some literature suggests increasing ERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}2 to bias away from false alarms (Thompson et al., 16 May 2025).
  • Late Decision Penalty: In sigmoid-based ERDE, true positives made after ERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}3 incur penalties similar to false positives, sometimes under-rewarding models with moderate delay but high accuracy (Burdisso et al., 2019).
  • Complexity for Downstream Use: The non-differentiable, piecewise nature of ERDE complicates its direct use as a training loss; however, recent work approximates ERDE with surrogate differentiable penalties in temporal fine-tuning of transformers (Thompson et al., 16 May 2025, Thompson et al., 2024).

Several alternatives build on or generalize ERDE:

  • Time-to-Detection (TTD): Average delay (in time units) between ground-truth onset and alarm, considering only detected positives, and disregarding false positives and missed cases (Farooque et al., 22 May 2026).
  • F-latency: Harmonic mean of precision and detection speed, often tracked alongside ERDE for model selection (Thompson et al., 2024, Thompson et al., 16 May 2025).
  • Ranking Metrics: Precision@k, NDCG@k, used as complementary criteria to ERDE for systems designed for prioritized screening.
  • Sliding‐window Schemes and Delay Encodings: Incorporation of explicit delay tokens into input representations and objective functions, allowing end-to-end optimization for early risk detection readiness (Thompson et al., 16 May 2025, Thompson et al., 2024).

7. Significance and Best Practices in ERDE-Optimized System Development

ERDE operationalizes the core requirement of timely and accurate intervention in longitudinal screening and monitoring, particularly in social or behavioral risk detection contexts. Key practices emerging from recent research include:

  • Parameterizing and validating ERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}4 on held-out sets reflecting real-world timeliness demands.
  • Encoding temporal delay into the input space when training temporal models.
  • Employing proportional or percentage-based ERDE when subject activity levels are highly variable.
  • Using ERDE, possibly in conjunction with TTD and F-latency, as early-stopping and model selection criteria.
  • Calibrating error costs (ERDEo(d,k)={cfp,if d=ptruth=n cfn,if d=ntruth=p c(k)ctp,if d=ptruth=p 0,if d=ntruth=n\mathrm{ERDE}_o(d, k) = \begin{cases} c_{fp}, & \text{if } d = p \land \mathrm{truth} = n \ c_{fn}, & \text{if } d = n \land \mathrm{truth} = p \ \ell_c(k)\cdot c_{tp}, & \text{if } d = p \land \mathrm{truth} = p \ 0, & \text{if } d = n \land \mathrm{truth} = n \end{cases}5) to match deployment-specific risk tolerances and policy objectives (Burdisso et al., 2019, Trotzek et al., 2018, Thompson et al., 16 May 2025, Thompson et al., 2024).

ERDE and its extensions provide a rigorous, interpretable, and widely adopted standard for evaluating early risk detection systems under real-world constraints, and continue to shape the design and benchmarking of temporal models in health and safety contexts.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Early Risk Detection Error (ERDE).