Frequency Improved Legendre Memory Model (FiLM)
- FiLM integrates Legendre polynomial-based memory with Fourier denoising and low-rank methods, achieving up to 22.6% MSE reduction in forecasts.
- Its modular design allows easy plug-in with existing models, boosting performance on both multivariate and univariate benchmarks.
- Empirical evaluations demonstrate significant efficiency gains with 80% fewer parameters and linear scaling in memory usage and training time.
The Frequency Improved Legendre Memory Model (FiLM) is a neural architecture for long-term time series forecasting that integrates Legendre polynomial projections, Fourier-based denoising, and low-rank parameterization. FiLM systematically enhances the representation and utilization of historical information within deep time-series models, delivering accuracy and efficiency gains over contemporary alternatives such as FEDformer, Autoformer, and S4. Its modular design enables direct integration as a plug-in layer for existing deep learning forecasters, and empirical results demonstrate significant improvements in both multivariate and univariate forecasting benchmarks (Zhou et al., 2022).
1. Legendre Memory Model: Theoretical Foundations
FiLM builds upon the Legendre Memory Model (LMM), which encodes the recent history of an input time series via projection onto a fixed number of shifted-and-scaled Legendre polynomials. For a time window , the model compresses the historical segment into a vector of coefficients as: where denotes the Legendre polynomial of degree .
The coefficient dynamics follow an ODE: with determined by Legendre recurrence. Discretization yields the update: where
Analytic forms for and are given by: At inference, an approximation of the original signal is reconstructible as:
2. Frequency Improvement via Fourier-Based Denoising
While Legendre projection preserves all frequencies—including noise—FiLM introduces a Fourier-based denoising module (Frequency Enhanced Layer, FEL). For each feature channel, an FFT is computed along the Legendre-index axis: Only the lowest modes are retained, weighted by learnable parameters : Higher modes () are zeroed, and inverse FFT reconstructs a denoised memory representation: This process robustly suppresses high-frequency noise while maintaining salient long-term components.
3. Low-Rank Parameterization for Efficiency
Naïvely, the learnable weights grow prohibitively large for high-dimensional problems. FiLM addresses this via tensor factorization: with , , , and the low-rank. For each mode : This reduces parameter count from to . Empirically, (0.41% of full size) yields negligible loss in MSE; even provides strong compression with minor performance reduction.
4. Model Architecture and Training Protocols
4.1 Single-Layer Block
A one-layer FiLM block consists of:
- Legendre Projection Unit (LPU): Produces Legendre coefficient sequence .
- Frequency Enhanced Layer (FEL): Applies the Fourier mask described above, yielding denoised .
- LPU_R: Reconstructs the forecast using the inverse Legendre-basis mapping.
4.2 Multiscale Mixture-of-Experts
FiLM processes histories at several time resolutions (e.g., ), with each block forecasting separate future windows; outputs are combined via a learned gating mechanism, capturing information from both medium- and long-range dependencies.
4.3 Optional Pre/Post-Processing
Per-series Instance Normalization (RevIN) can be applied before and after FiLM to enhance robustness to distribution shift. Its use is dataset-dependent.
4.4 Default Hyperparameters
| Component | Default Value | Notes |
|---|---|---|
| Legendre dim. | 256 | Number of polynomial bases |
| Fourier modes | 32 | Number of frequencies retained |
| Low-rank | 4 | Tradeoff param. |
| Scales | 3 | |
| Batch size | 32–256 | Task dependent |
| Optimizer | Adam | Learning rate schedule over 15 epochs |
4.5 Training Objective
The model is trained using Mean Squared Error (MSE): No curriculum schedules or special warm-up phases are used. MSE is the primary loss; Mean Absolute Error (MAE) is reported but not optimized.
5. Empirical Evaluation and Ablation Analysis
5.1 Comparative Benchmarks
Across six real-world datasets (Traffic, Electricity, Exchange, Weather, ILI, ETTm/ETTh) and a range of forecast horizons, FiLM demonstrates substantial error reductions relative to prior SOTA models:
| Task | MSE Reduction vs Best Prior |
|---|---|
| Multivariate | 20.3% (vs FEDformer) |
| Univariate | 22.6% |
Seven competitive baselines are evaluated, including FEDformer, Autoformer, Informer, S4, LogTrans, and Reformer.
5.2 Module Drop-In and Substitution
- Substituting the LPU for a linear layer degrades all architectures tested.
- Augmenting existing MLP, LSTM, CNN, or Transformer networks with LPU and FEL offers consistent and large MSE improvements (8–120% relative gain).
5.3 Component Ablations
- Replacing FEL by standard MLP, LSTM, CNN, or vanilla Attention yields 5–300% worse performance.
- Reducing low-rank from 256 to 4 compresses weights to 0.41% of baseline with <1% MSE increase; achieves within 5% of full performance.
- Limiting to lowest Fourier modes is robust; some datasets benefit from including a small fraction of higher modes.
5.4 Efficiency
- Parameter count: FiLM uses 80% fewer trainable weights than FEDformer.
- Memory usage and training time scale linearly in input length, outperforming deeper competitors by ca. 50% per epoch.
6. Integration with Existing Time Series Forecasters
FiLM's memory and denoising modules can be embedded into arbitrary forecasting architectures:
- Prepend LPU: Replace raw series with Legendre state .
- Apply FEL: Perform the Fourier-based denoising as described.
- Decode: Use reconstructed features or pass to the backbone forecaster.
Empirical evidence shows up to 120% MSE reduction as a plug-in to existing MLP, LSTM, CNN, and vanilla attention models, with negligible parameter overhead (ca. 0.5% of full model).
7. Significance and Implications
FiLM demonstrates that Legendre polynomial-based memory, augmented with frequency selection and low-rank adaptation, offers a principled and practical approach for long-term sequence modeling. It balances expressiveness and regularization, efficiently attenuates overfitting to noise, and is broadly applicable as a module across network architectures. FiLM's empirical performance on real-world datasets and its ablation support the centrality of structured memory and frequency-aware denoising in advancing time-series forecasting (Zhou et al., 2022).