Frequency Improved Legendre Memory Model (FiLM)

Updated 16 March 2026

FiLM integrates Legendre polynomial-based memory with Fourier denoising and low-rank methods, achieving up to 22.6% MSE reduction in forecasts.
Its modular design allows easy plug-in with existing models, boosting performance on both multivariate and univariate benchmarks.
Empirical evaluations demonstrate significant efficiency gains with 80% fewer parameters and linear scaling in memory usage and training time.

The Frequency Improved Legendre Memory Model (FiLM) is a neural architecture for long-term time series forecasting that integrates Legendre polynomial projections, Fourier-based denoising, and low-rank parameterization. FiLM systematically enhances the representation and utilization of historical information within deep time-series models, delivering accuracy and efficiency gains over contemporary alternatives such as FEDformer, Autoformer, and S4. Its modular design enables direct integration as a plug-in layer for existing deep learning forecasters, and empirical results demonstrate significant improvements in both multivariate and univariate forecasting benchmarks (Zhou et al., 2022).

1. Legendre Memory Model: Theoretical Foundations

FiLM builds upon the Legendre Memory Model (LMM), which encodes the recent history of an input time series $x(t)$ via projection onto a fixed number of shifted-and-scaled Legendre polynomials. For a time window $[t-\theta, t]$ , the model compresses the historical segment into a vector of coefficients $c(t) \in \mathbb{R}^N$ as: $c_n(t) = \langle x(s), P_n\left(\frac{2(s-t)}{\theta} + 1\right) \rangle, \quad n = 0, \dots, N-1$ where $P_n$ denotes the Legendre polynomial of degree $n$ .

The coefficient dynamics follow an ODE: $\frac{d}{dt}c(t) = -\frac{1}{\theta}A c(t) + \frac{1}{\theta}B x(t)$ with $A, B$ determined by Legendre recurrence. Discretization yields the update: $c_t = A_d c_{t-1} + B_d x_t$ where

$A_d = \left(I+\frac{\Delta t}{2\theta}A\right)^{-1}\left(I-\frac{\Delta t}{2\theta}A\right), \quad B_d = \left(I+\frac{\Delta t}{2\theta}A\right)^{-1} \frac{\Delta t}{\theta} B$

Analytic forms for $A$ and $B$ are given by: $A_{n,k} = (2n+1)\begin{cases} (-1)^{n-k}, & k\leq n \ 1, & k>n \end{cases},\quad B_n = (2n+1)(-1)^n$ At inference, an approximation of the original signal is reconstructible as: $\hat{x}(s) = \sum_{n=0}^{N-1} c_n(t) P_n\left(\frac{2(s-t)}{\theta} + 1\right)$

2. Frequency Improvement via Fourier-Based Denoising

While Legendre projection preserves all frequencies—including noise—FiLM introduces a Fourier-based denoising module (Frequency Enhanced Layer, FEL). For each feature channel, an FFT is computed along the Legendre-index axis: $\mathcal{F}\{C\}[k] = \sum_{n=0}^{N-1} C[n]\, e^{-2\pi i k n/N}, \quad k = 0, ..., \lfloor N/2 \rfloor$ Only the lowest $M$ modes are retained, weighted by learnable parameters $W[k]$ : $\widetilde{C}_f[k] = W[k]\,\mathcal{F}\{C\}[k], \quad k = 0, ..., M-1$ Higher modes ( $k \geq M$ ) are zeroed, and inverse FFT reconstructs a denoised memory representation: $C'(n) = \sum_{k=0}^{M-1} \widetilde{C}_f[k]\, e^{2\pi i k n/N}$ This process robustly suppresses high-frequency noise while maintaining salient long-term components.

3. Low-Rank Parameterization for Efficiency

Naïvely, the learnable weights $W \in \mathbb{R}^{D \times M \times D}$ grow prohibitively large for high-dimensional problems. FiLM addresses this via tensor factorization: $W \approx W_2 W_1 W_0$ with $W_0 \in \mathbb{R}^{D \times r}$ , $W_1 \in \mathbb{R}^{r \times r \times M}$ , $W_2 \in \mathbb{R}^{r \times D}$ , and $r \ll D$ the low-rank. For each mode $k$ : $\widetilde{C}_f[k] = W_2\left(W_1[k](W_0^\top \mathcal{F}\{C\}[k])\right)$ This reduces parameter count from $O(D^2 M)$ to $O(rD + r^2 M + rD)$ . Empirically, $r=4$ (0.41% of full size) yields negligible loss in MSE; even $r=1$ provides strong compression with minor performance reduction.

4. Model Architecture and Training Protocols

4.1 Single-Layer Block

A one-layer FiLM block consists of:

Legendre Projection Unit (LPU): Produces Legendre coefficient sequence $C$ .
Frequency Enhanced Layer (FEL): Applies the Fourier mask described above, yielding denoised $C'$ .
LPU_R: Reconstructs the forecast using the inverse Legendre-basis mapping.

4.2 Multiscale Mixture-of-Experts

FiLM processes histories at several time resolutions (e.g., $T, 2T, 4T$ ), with each block forecasting separate future windows; outputs are combined via a learned gating mechanism, capturing information from both medium- and long-range dependencies.

4.3 Optional Pre/Post-Processing

Per-series Instance Normalization (RevIN) can be applied before and after FiLM to enhance robustness to distribution shift. Its use is dataset-dependent.

4.4 Default Hyperparameters

Component	Default Value	Notes
Legendre dim. $N$	256	Number of polynomial bases
Fourier modes $M$	32	Number of frequencies retained
Low-rank $r$	4	Tradeoff param.
Scales	3	$T, 2T, 4T$
Batch size	32–256	Task dependent
Optimizer	Adam	Learning rate schedule $10^{-3} \rightarrow 10^{-4}$ over 15 epochs

4.5 Training Objective

The model is trained using Mean Squared Error (MSE): $\mathcal{L} = \frac{1}{N}\sum_{i=1}^N \| \hat{y}_i - y_i \|^2$ No curriculum schedules or special warm-up phases are used. MSE is the primary loss; Mean Absolute Error (MAE) is reported but not optimized.

5. Empirical Evaluation and Ablation Analysis

5.1 Comparative Benchmarks

Across six real-world datasets (Traffic, Electricity, Exchange, Weather, ILI, ETTm/ETTh) and a range of forecast horizons, FiLM demonstrates substantial error reductions relative to prior SOTA models:

Task	MSE Reduction vs Best Prior
Multivariate	20.3% (vs FEDformer)
Univariate	22.6%

Seven competitive baselines are evaluated, including FEDformer, Autoformer, Informer, S4, LogTrans, and Reformer.

5.2 Module Drop-In and Substitution

Substituting the LPU for a linear layer degrades all architectures tested.
Augmenting existing MLP, LSTM, CNN, or Transformer networks with LPU and FEL offers consistent and large MSE improvements (8–120% relative gain).

5.3 Component Ablations

Replacing FEL by standard MLP, LSTM, CNN, or vanilla Attention yields 5–300% worse performance.
Reducing low-rank $r$ from 256 to 4 compresses weights to 0.41% of baseline with <1% MSE increase; $r=1$ achieves within 5% of full performance.
Limiting to lowest $M$ Fourier modes is robust; some datasets benefit from including a small fraction of higher modes.

5.4 Efficiency

Parameter count: FiLM $(r=4)$ uses 80% fewer trainable weights than FEDformer.
Memory usage and training time scale linearly in input length, outperforming deeper competitors by ca. 50% per epoch.

6. Integration with Existing Time Series Forecasters

FiLM's memory and denoising modules can be embedded into arbitrary forecasting architectures:

Prepend LPU: Replace raw series $X(t)$ with Legendre state $c(t)$ .
Apply FEL: Perform the Fourier-based denoising as described.
Decode: Use reconstructed features or pass $c'(t)$ to the backbone forecaster.

Empirical evidence shows up to 120% MSE reduction as a plug-in to existing MLP, LSTM, CNN, and vanilla attention models, with negligible parameter overhead (ca. 0.5% of full model).

7. Significance and Implications

FiLM demonstrates that Legendre polynomial-based memory, augmented with frequency selection and low-rank adaptation, offers a principled and practical approach for long-term sequence modeling. It balances expressiveness and regularization, efficiently attenuates overfitting to noise, and is broadly applicable as a module across network architectures. FiLM's empirical performance on real-world datasets and its ablation support the centrality of structured memory and frequency-aware denoising in advancing time-series forecasting (Zhou et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Frequency Improved Legendre Memory Model (FiLM).