Recurrency Baseline in Temporal Modeling

Updated 18 May 2026

Recurrency Baseline is a method that leverages temporal repetition as a benchmark for sequential learning tasks in domains such as temporal knowledge graph forecasting and survival analysis.
It utilizes past observation frequencies and recency scores through techniques ranging from nonparametric statistics to recurrent neural network architectures.
The approach robustly benchmarks complex temporal models while revealing limitations in capturing long-range dependencies and latent multi-hop relationships.

A recurrency baseline is a reference method or model that exploits the temporal recurrence of information, events, or states, and serves as a principled foundation or benchmark for various classes of temporal or sequential learning tasks. Across machine learning, temporal knowledge graph forecasting, survival analysis, reinforcement learning, and related domains, "recurrency baseline" covers a diverse set of methodologies unified by the inductive bias that past events or states are informative predictors of future observations, often operationalized by explicitly modeling direct repetitions, cycling hazards, or sequential memory.

1. Foundational Principle and Formal Definitions

Recurrency baselines leverage the statistical or mechanistic property that many real-world processes exhibit temporal repetition—either at the level of raw facts, state transitions, or event occurrence distributions. In the context of temporal knowledge graph (TKG) forecasting, the canonical “recurrency baseline” is constructed under the hypothesis that facts tend to recur, operationalized by directly scoring future queries according to the temporal recency and frequency of their past realizations (Gastinger et al., 2024). In survival and event-history analysis, a recurrency baseline often refers to the nonparametric or semi-parametric estimation of the recurrent-event hazard function, viewed as the baseline risk of event reoccurrence unadjusted for covariates, sometimes stratified by event order or individual risk history (Hernández-Herrera et al., 2021, Botha et al., 2 May 2025, Rytgaard et al., 2024).

In deep learning and sequential modeling, recurrency baselines also refer to recurrent architectural motifs—typically RNNs, LSTMs, or GRUs—in which the model’s internal state is explicitly propagated and updated across time steps to encode and utilize past information (Shifas et al., 2020, Ni et al., 2021, Hart et al., 2021, Zhang et al., 2020, Yang et al., 2021).

2. Recurrency Baselines in Temporal Knowledge Graph Forecasting

In TKG forecasting, the recurrency baseline is a nonparametric, training-free model centered on the principle that “history repeats itself” (Gastinger et al., 2024). Formally, let $G \subseteq E \times R \times E \times T$ be a temporal K.G. with entities $E$ , relations $R$ , and discrete timesteps $T$ . For each tail-prediction query $(s, r, ?, t^+)$ and candidate $x \in E$ , the recurrency baseline scoring functions are:

Strict recurrency:

$\phi_\Delta((s, r, o, t^+), G) = \begin{cases} \Delta(t^+, \text{last}(s, r, o)), & \text{if } \text{last}(s, r, o) \text{ defined} \ 0, & \text{otherwise} \end{cases}$

with $\Delta$ a monotone function and $\text{last}(s, r, o) = \max\{k \in T \mid (s, r, o, k) \in G\}$ .

Relaxed recurrency:

$\xi_\text{tail}((s, r, o, t^+), G) = \frac{|\{(s', k): (s', r, o, k) \in G \}|}{|\{ (s', o', k): (s', r, o', k) \in G \}|}$

and analogously for head queries.

Combined recurrency:

$E$ 0

where relation-specific weights $E$ 1 are tuned on validation data.

This baseline, devoid of any iterative optimization or learned embeddings, ranks entities exclusively using tabulated statistics from the historical graph. Datasets such as ICEWS14/18, GDELT, YAGO, and WIKI are commonly used for benchmarking. The recurrency baseline outperformed or matched state-of-the-art deep TKG models in three of five datasets, highlighting its strong inductive prior when recurrency degree is high (Gastinger et al., 2024).

3. Recurrency in Survival and Event-History Analysis

In recurrent survival analysis, the baseline hazard $E$ 2 in Cox models represents the underlying event recurrence risk, with extensions stratified by event order or left-censoring. When the number of previous episodes is unknown (left-censored), valid modeling requires double stratification on both event count (possibly imputed) and prior risk status, with frailty to capture residual heterogeneity (Hernández-Herrera et al., 2021). Precisely,

Counting-process baseline hazard:

$E$ 3

with $E$ 4 a Gamma frailty.

Gap-time stratified baseline:

$E$ 5

with episode-specific baselines $E$ 6.

Incorrect pooling of subjects with differing event histories under a common $E$ 7 baseline induces severe bias; double stratification and multiple imputation restore valid estimation (Hernández-Herrera et al., 2021).

In credit risk modeling, the time-to-first-default (TFD) Cox model serves as a recurrency baseline against Andersen-Gill (AG) and Prentice-Williams-Peterson (PWP) extensions. For portfolios with rare repeat defaults, the TFD baseline accurately estimates marginal pointwise default probabilities:

$E$ 8

where event recurrence can safely be ignored without loss of calibration or discrimination (Botha et al., 2 May 2025).

4. Recurrency Baselines in Neural Sequential and RL Models

In deep learning, recurrency baselines are frequently instantiated as canonical RNN, LSTM, or GRU architectures, serving as model classes with explicit temporal state transfer. This includes:

Basic RNN recurrency: A deterministic, state-space system with a fixed stable linear $E$ 9 term to ensure bounded dynamics, and transitions:

$R$ 0

where stability of $R$ 1 provides a robust baseline for analyzing vanishing/exploding gradients and architecture variants (Salem, 2016).

Recurrent feature extraction in speech enhancement: augmenting convolutional modules with temporal gating, e.g., a gruCNN cell (local GRU recurrency embedded per feature map), significantly increases context-adaptation and robustness to non-stationary noise, outperforming feedforward and global-recurrent approaches (Shifas et al., 2020). See quantitative comparison:

Model	SSNR @2.5 dB	SSNR @12.5 dB	MOS	Params
CNN_FC-SE	2.39	7.61	2.75	11.13M
CNN_LSTM-SE	3.20	7.85	2.77	36.10M
gruCNN_FC-SE	3.94	8.96	3.16	27.22M

Recurrent baseline in reinforcement learning: Recurrent model-free agents (LSTM/GRU RNNs) in off-policy RL (TD3, SAC) define strong baselines for POMDPs, often matching or outperforming specialized architectures when tuned appropriately. Essential practices include using separate RNN encoders for actor and critic, feeding observation-action-reward (oar) inputs, and tuning sequence length by task (Ni et al., 2021, Yang et al., 2021).

5. Applications and Empirical Performance

Recurrency baselines have found utility in knowledge graph forecasting, RL for POMDPs, medical event prediction, financial hazard modeling, speech enhancement, and image segmentation under noisy or low-shot regimes.

Temporal knowledge graphs: The recurrency baseline sets a robust "floor" for future fact prediction, outperforming more complex GNNs/transformers on benchmarks with high direct fact repetition (e.g., YAGO, WIKI) and uncovering failure modes in advanced models that cannot even learn recurrency (Gastinger et al., 2024).
Speech enhancement: Locally recurrent convolutional architectures (gruCNN) yield significant gains in segmental SNR and MOS, even with 25% fewer parameters compared to CNN+global RNN baselines (Shifas et al., 2020).
RL under partial observability: Well-implemented recurrent TD3/SAC actors deliver 5–100× faster learning than prior on-policy or model-based approaches on 18/21 POMDP environments, providing a rigorous baseline for new partial observability and meta-RL algorithms (Ni et al., 2021, Hart et al., 2021, Yang et al., 2021).
Recurrent hazard modeling: Stratified baseline hazards in Cox-type models (with multiple imputation for left-censored history) ensure unbiased coefficient estimates and valid interval coverage under event dependence and high left-censoring (Hernández-Herrera et al., 2021).

6. Limitations and Failure Modes

Recurrency baselines are not universally optimal. Their limitations include:

Sensitivity to recurrency degree: Effectiveness depends on the proportion of recurring events/triples. When the test set is dominated by previously unobserved instances (e.g., ICEWS KG datasets, ~50% unseen), recurrency baselines assign zero or low scores to novel facts (Gastinger et al., 2024).
Expressivity limitations: Such baselines cannot model latent multi-hop inferences, long-range temporal dependencies beyond direct repetition, or structural patterns not manifest in raw frequencies.
RL and task complexity: In RL, recurrency (or frame-stacking) does not robustly compensate for missing fundamental state variables (e.g., unobserved velocities) except in simplified settings; both approaches fail when the temporal structure is too complex for shallow recurrent memory (Hart et al., 2021).
Benchmarking risk: If recurrency alone matches or exceeds the performance of learned models, the latter's claimed improvements may be illusory, reflecting lack of nontrivial temporal modeling (Gastinger et al., 2024).

7. Recommendations and Best Practices

For rigorous research and model evaluation, the following guidelines are established:

Include recurrency baselines: Any temporal forecasting or sequential learning experiment should report performance against a recurrency baseline tailored to the domain—e.g., direct fact repetition and count normalization in KGs, RNNs/GRUs in sequence learning, stratified baseline hazards in recurrent-event survival models (Gastinger et al., 2024, Hernández-Herrera et al., 2021, Ni et al., 2021).
Report recurrency degree/statistics: Dataset-level metrics such as the proportion of repeated facts or degree of event dependence inform the plausibility and ceiling of recurrency-driven performance (Gastinger et al., 2024).
Parameterization: Prefer simple, analytically tractable, and non-iterative recurrency scoring functions as base case models (e.g., tabulated last-seen time, frequency counts, or fixed-structure RNNs with analytically stable dynamics), establishing the minimum performance standard.
Limitations awareness: Use negative results (where recurrency baselines fall short) to justify the necessity of more expressive or logically capable temporal models.
Domain-tailored design: For stochastic hazard processes, stratify recurrency baselines across event order and observed/unobserved risk histories, incorporating frailty and imputation as required (Hernández-Herrera et al., 2021). For temporal deep learning, clearly separate recurrency from feedforward structure and benchmark both appropriately.

Recurrency baselines, rigorously implemented and interpreted, play a pivotal role in calibrating progress, uncovering method weaknesses, and incentivizing genuine temporal reasoning advances in a wide range of sequential modeling domains.