Inter-period Redundancy Filtering (IRF)
- Inter-period Redundancy Filtering (IRF) is a module that removes redundant overlapping information from multi-period inputs in financial time-series forecasting.
- IRF integrates within the Multi-period Learning Framework by subtracting repeated embeddings, enabling transformers to focus on unique, horizon-specific signals.
- Empirical studies show IRF improves forecasting metrics such as MSE and WMAPE by efficiently redistributing self-attention across time windows.
Inter-period Redundancy Filtering (IRF) is a module introduced within the Multi-period Learning Framework (MLF) for financial time-series forecasting. IRF is designed to address the challenge of redundant information in multi-period historical inputs, where windows of differing lengths contain overlapping temporal segments. By explicitly removing the component of each period that is redundant with all shorter historical windows, IRF enables transformers to more effectively model unique information at each temporal horizon, facilitating more accurate and efficient use of multi-period self-attention in time series forecasting models.
1. Motivation and Core Problem
Financial time series are influenced by heterogeneous temporal dynamics: short windows (e.g., 5 days) often capture abrupt shifts, while longer windows (e.g., 30 days) reflect gradual trends. When these multi-period windows are concatenated or processed jointly, the longer window(s) necessarily encode all information present in the shorter ones, resulting in high inter-period redundancy.
This redundancy produces two principal issues:
- Attention focus bias: The transformer’s self-attention mechanism disproportionately attends to repeated tokens, i.e., overlapping segments across periods, rather than the unique content in each period.
- Signal underutilization: Period-specific features (such as a spike in a short window) may be diminished, as the model detects them multiple times but cannot assign them unique contextual significance.
IRF was developed to systematically mitigate these issues by subtracting the redundant components of each longer-period embedding, allowing subsequent attention layers to operate on de-redundified, period-distilled representations.
2. Architectural Integration within MLF
Within the Multi-period Learning Framework, IRF is positioned after the Multi-period Multi-head Self-Attention (MA) module in each stacked “MLF block.” The processing steps in block can be summarized as:
- Multi-period Multi-head Self-Attention (MA) receives a concatenated embedding , where is embedding dimension and is the sum of patches across periods.
- Inter-period Redundancy Filtering (IRF) splits into sub-tensors, one per period: for .
- Each is passed through a Sub-Period-Predictor (SPP) head with two parallel linear branches:
- A forecast branch (predicting future steps),
- A redundancy-estimation branch outputting .
- The core IRF operation then computes the de-redundified embedding for period :
where is the key-dimension stabilizing scale from self-attention.
- All de-redundified period embeddings are concatenated back into the composite tensor for input to the next block.
Stacking such blocks enables the model to recursively refine its estimates of which segments in longer windows are merely repetitions of those from shorter windows.
3. Mathematical Formalism
Let be the block- transformer embedding, where and is the patch count for period . Then:
- Splitting: , with .
- Sub-Period-Predictor (SPP) branches for each period embedding:
where is forecast output, is the redundancy estimate.
- Redundancy subtraction:
- Reassembly: .
Key hyperparameters include the number of periods , period-specific patch counts , block depth , embedding dimension , and attention key dimension .
4. Algorithmic Implementation and Computational Cost
Algorithmic Steps:
- For to :
- Extract from .
- Compute .
- For to :
- Compute .
- Concatenate all to form .
Pseudocode:
1 2 3 4 5 6 7 8 9 10 11 |
for s in range(S): z_e_s = z_e[:, offset_s : offset_s + N_s] X_f_e_s, eps_e_s = SPP(z_e_s) store z_e_s, eps_e_s for s in range(S): correction = sum(eps_e_j / sqrt(d_k) for j in range(s)) z_e_hat_s = z_e_s - correction append z_e_hat_s to list z_e_hat = concatenate(z_e_hat_1, ..., z_e_hat_S, axis=1) |
Computational Complexity:
IRF adds operations per block, where due to the light SPP heads and associated tensor arithmetic, in contrast to the cost of multi-head self-attention per block. Memory overhead from storing is of the same order as the embeddings, and is dominated by the quadratic size of self-attention maps.
5. Empirical Effectiveness and Ablation Results
An ablation paper was performed on five datasets (Fund, Electricity, ETTh1, Illness, Exchange) to test the necessity and impact of IRF. When IRF was disabled (no subtraction), MLF’s forecasting accuracy declined across all metrics and datasets, as outlined in the following comparisons (lower is better for MSE and WMAPE):
| Dataset | MLF w/o IRF | Full MLF (with IRF) |
|---|---|---|
| Fund (WMAPE) | 78.56% | 75.84% |
| Electricity (MSE) | 0.0500 | 0.0472 |
| ETTh1 (MSE) | 0.091 | 0.087 |
| Illness (MSE) | 0.163 | 0.149 |
| Exchange (MSE) | 0.0033 | 0.0029 |
Visualization of average self-attention heatmaps revealed that in the absence of IRF, attention “locked on” to the diagonal blocks representing repeated regions, whereas inclusion of IRF distributed attention more evenly, confirming effective de-redundification.
6. Strengths, Limitations, and Prospective Enhancements
Strengths:
- Directly addresses the challenge of overlapping information inherent to multi-period input for time series.
- Integrates efficiently within transformer architectures, preserving self-attention complexity.
- Demonstrated consistent empirical improvements across heterogeneous datasets.
Limitations and Extensions:
- The current linear SPP estimation of redundancy () may lack representational power for complex redundancy; employing non-linear MLP or small attention modules could refine redundancy extraction.
- IRF’s redundancy subtraction is unidirectional (from shorter to longer periods); this suggests that full pairwise correction or bidirectional filtering could be explored.
- Accumulated storage of may scale unfavorably in models with large or ; low-rank factorization or parameter sharing could address this.
- Fixed scaling is used; a plausible implication is that learnable or adaptive per-period/block scaling could enhance flexibility.
7. Relevance within Financial Time Series Forecasting
IRF is central to the MLF paradigm for multi-period financial time-series forecasting. By systematically removing duplicate temporal information, it allows downstream model components to concentrate on horizon-specific and non-redundant content. Its low computational overhead, compatibility with self-attention, and robust improvements across diverse benchmarks substantiate its utility in advanced time-series models for the financial domain (Zhang et al., 7 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free