Inter-period Redundancy Filtering (IRF)

Updated 14 November 2025

Inter-period Redundancy Filtering (IRF) is a module that removes redundant overlapping information from multi-period inputs in financial time-series forecasting.
IRF integrates within the Multi-period Learning Framework by subtracting repeated embeddings, enabling transformers to focus on unique, horizon-specific signals.
Empirical studies show IRF improves forecasting metrics such as MSE and WMAPE by efficiently redistributing self-attention across time windows.

Inter-period Redundancy Filtering (IRF) is a module introduced within the Multi-period Learning Framework (MLF) for financial time-series forecasting. IRF is designed to address the challenge of redundant information in multi-period historical inputs, where windows of differing lengths contain overlapping temporal segments. By explicitly removing the component of each period that is redundant with all shorter historical windows, IRF enables transformers to more effectively model unique information at each temporal horizon, facilitating more accurate and efficient use of multi-period self-attention in time series forecasting models.

1. Motivation and Core Problem

Financial time series are influenced by heterogeneous temporal dynamics: short windows (e.g., 5 days) often capture abrupt shifts, while longer windows (e.g., 30 days) reflect gradual trends. When these multi-period windows are concatenated or processed jointly, the longer window(s) necessarily encode all information present in the shorter ones, resulting in high inter-period redundancy.

This redundancy produces two principal issues:

Attention focus bias: The transformer’s self-attention mechanism disproportionately attends to repeated tokens, i.e., overlapping segments across periods, rather than the unique content in each period.
Signal underutilization: Period-specific features (such as a spike in a short window) may be diminished, as the model detects them multiple times but cannot assign them unique contextual significance.

IRF was developed to systematically mitigate these issues by subtracting the redundant components of each longer-period embedding, allowing subsequent attention layers to operate on de-redundified, period-distilled representations.

2. Architectural Integration within MLF

Within the Multi-period Learning Framework, IRF is positioned after the Multi-period Multi-head Self-Attention (MA) module in each stacked “MLF block.” The processing steps in block $e$ can be summarized as:

Multi-period Multi-head Self-Attention (MA) receives a concatenated embedding $z_e \in \mathbb{R}^{D \times N}$ , where $D$ is embedding dimension and $N$ is the sum of patches across $S$ periods.
Inter-period Redundancy Filtering (IRF) splits $z_e$ into $S$ sub-tensors, one per period: $z_e^s \in \mathbb{R}^{D \times N^s}$ for $s = 1, \ldots, S$ .
Each $z_e^s$ $z_{e}^{s}$ is passed through a Sub-Period-Predictor (SPP) head with two parallel linear branches:
- A forecast branch (predicting future steps),
- A redundancy-estimation branch outputting $\epsilon_e^s \in \mathbb{R}^{D \times N^s}$ .
The core IRF operation then computes the de-redundified embedding for period $s$ :

$\hat{z}_e^s = z_e^s - \sum_{j=1}^{s-1} \left(\epsilon_e^j / \sqrt{d_k}\right)$

where $d_k$ is the key-dimension stabilizing scale from self-attention.

All de-redundified period embeddings are concatenated back into the composite tensor $\hat{z}_e = [\hat{z}_e^1;\ldots;\hat{z}_e^S]$ for input to the next block.

Stacking $E$ such blocks enables the model to recursively refine its estimates of which segments in longer windows are merely repetitions of those from shorter windows.

3. Mathematical Formalism

Let $z_e \in \mathbb{R}^{D \times N}$ be the block- $e$ transformer embedding, where $N = \sum_{s=1}^{S} N^s$ and $N^s$ is the patch count for period $s$ . Then:

Splitting: $z_e = \mathrm{Concat}(z_e^1, \ldots, z_e^S)$ , with $z_e^s \in \mathbb{R}^{D \times N^s}$ .
Sub-Period-Predictor (SPP) branches for each period embedding:

$(\hat{X}_{f_e}^s, \epsilon_e^s) = \mathrm{SPP}(z_e^s)$

where $\hat{X}_{f_e}^s$ is forecast output, $\epsilon_e^s$ is the redundancy estimate.

Redundancy subtraction:

$\hat{z}_e^s = z_e^s - \sum_{j=1}^{s-1} \frac{\epsilon_e^j}{\sqrt{d_k}}$

Reassembly: $\hat{z}_e = \mathrm{Concat}(\hat{z}_e^1, \ldots, \hat{z}_e^S) \in \mathbb{R}^{D \times N}$ .

Key hyperparameters include the number of periods $S$ , period-specific patch counts $N^s$ , block depth $E$ , embedding dimension $D$ , and attention key dimension $d_k$ .

4. Algorithmic Implementation and Computational Cost

Algorithmic Steps:

For $s=1$ $s = 1$ to $S$ $S$ :
- Extract $z_e^s$ from $z_e$ .
- Compute $(\hat{X}_{f_e}^s, \epsilon_e^s) = \text{SPP}(z_e^s)$ .
For $s=1$ $s = 1$ to $S$ $S$ :
- Compute $\hat{z}_e^s = z_e^s - \sum_{j=1}^{s-1} (\epsilon_e^j / \sqrt{d_k})$ .
Concatenate all $\hat{z}_e^s$ to form $\hat{z}_e$ .

Pseudocode:

for s in range(S):
    z_e_s = z_e[:, offset_s : offset_s + N_s]
    X_f_e_s, eps_e_s = SPP(z_e_s)
    store z_e_s, eps_e_s

for s in range(S):
    correction = sum(eps_e_j / sqrt(d_k) for j in range(s))
    z_e_hat_s = z_e_s - correction
    append z_e_hat_s to list

z_e_hat = concatenate(z_e_hat_1, ..., z_e_hat_S, axis=1)

Computational Complexity:

IRF adds $O(S \cdot D \cdot N_{\mathrm{max}})$ operations per block, where $N_{\mathrm{max}} = \max_s N^s$ due to the light SPP heads and associated tensor arithmetic, in contrast to the $O(N^2 D)$ cost of multi-head self-attention per block. Memory overhead from storing $\epsilon_e^s$ is of the same order as the embeddings, and is dominated by the quadratic size of self-attention maps.

5. Empirical Effectiveness and Ablation Results

An ablation paper was performed on five datasets (Fund, Electricity, ETTh1, Illness, Exchange) to test the necessity and impact of IRF. When IRF was disabled (no $\epsilon_e^j$ subtraction), MLF’s forecasting accuracy declined across all metrics and datasets, as outlined in the following comparisons (lower is better for MSE and WMAPE):

Dataset	MLF w/o IRF	Full MLF (with IRF)
Fund (WMAPE)	78.56%	75.84%
Electricity (MSE)	0.0500	0.0472
ETTh1 (MSE)	0.091	0.087
Illness (MSE)	0.163	0.149
Exchange (MSE)	0.0033	0.0029

Visualization of average self-attention heatmaps revealed that in the absence of IRF, attention “locked on” to the diagonal blocks representing repeated regions, whereas inclusion of IRF distributed attention more evenly, confirming effective de-redundification.

6. Strengths, Limitations, and Prospective Enhancements

Strengths:

Directly addresses the challenge of overlapping information inherent to multi-period input for time series.
Integrates efficiently within transformer architectures, preserving $O(N^2)$ self-attention complexity.
Demonstrated consistent empirical improvements across heterogeneous datasets.

Limitations and Extensions:

The current linear SPP estimation of redundancy ( $\epsilon_e^s$ ) may lack representational power for complex redundancy; employing non-linear MLP or small attention modules could refine redundancy extraction.
IRF’s redundancy subtraction is unidirectional (from shorter to longer periods); this suggests that full pairwise correction or bidirectional filtering could be explored.
Accumulated storage of $\epsilon_e^s$ may scale unfavorably in models with large $S$ or $D$ ; low-rank factorization or parameter sharing could address this.
Fixed $\sqrt{d_k}$ scaling is used; a plausible implication is that learnable or adaptive per-period/block scaling could enhance flexibility.

7. Relevance within Financial Time Series Forecasting

IRF is central to the MLF paradigm for multi-period financial time-series forecasting. By systematically removing duplicate temporal information, it allows downstream model components to concentrate on horizon-specific and non-redundant content. Its low computational overhead, compatibility with self-attention, and robust improvements across diverse benchmarks substantiate its utility in advanced time-series models for the financial domain (Zhang et al., 7 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Multi-period Learning for Financial Time Series Forecasting (2025)

Follow Topic

Get notified by email when new papers are published related to Inter-period Redundancy Filtering (IRF).