SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters
(2405.00946v2)
Published 2 May 2024 in cs.LG
Abstract: This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data. This technique involves downsampling the original sequences to focus on cross-period trend prediction, effectively extracting periodic features while minimizing the model's complexity and parameter count. Based on this technique, the SparseTSF model uses fewer than 1k parameters to achieve competitive or superior performance compared to state-of-the-art models. Furthermore, SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data. The code is publicly available at this repository: https://github.com/lss-1138/SparseTSF.
The paper introduces SparseTSF, a lightweight model for long-term time series forecasting that employs a sparse technique to achieve competitive performance with <1k parameters.
The core innovation, the Sparse technique, simplifies forecasting by decoupling periodicity and trend, downsampling sequences for efficient prediction and feature extraction.
SparseTSF is significantly more efficient than other models in parameters and MACs, demonstrating strong generalization for resource-constrained environments.
The paper introduces SparseTSF, a novel, lightweight model designed for long-term time series forecasting (LTSF). The core innovation is the Cross-Period Sparse Forecasting technique (referred to as the Sparse technique), which simplifies forecasting by decoupling periodicity and trend in time series data. SparseTSF downsamples the original sequences to focus on cross-period trend prediction, extracting periodic features while minimizing model complexity and parameter count. The model reportedly achieves performance comparable to state-of-the-art models with fewer than 1,000 parameters and demonstrates strong generalization capabilities, making it suitable for resource-constrained scenarios.
The challenges in LTSF stem from the need to extract extensive temporal dependencies from longer historical windows, leading to complex models with millions of parameters. The key insight of the Sparse technique is to decompose the periodicity and trend of time series data. Periodic patterns are transformed into inter-subsequence dynamics, while trend patterns are reinterpreted as intra-subsequence characteristics.
The main contributions of the paper include:
The introduction of the Cross-Period Sparse Forecasting technique.
The SparseTSF model, which has fewer than 1,000 parameters.
Demonstration of competitive predictive accuracy and robust generalization capabilities.
Related work in LTSF involves Transformer-based models, Convolutional Neural Networks (CNNs), Multilayer Perceptrons (MLPs), and the application of pre-trained LLMs. The CI strategy simplifies the forecasting process by focusing on individual univariate time series.
The SparseTSF model utilizes a single linear layer within the Cross-Period Sparse Forecasting framework. Given a time series xt−L+1:t(i) with a known periodicity w, the Sparse technique downsamples the original series into w subsequences of length n=⌊wL⌋. A model with shared parameters is applied to these subsequences for prediction. After prediction, the w subsequences, each of length m=⌊wH⌋, are upsampled back to a complete forecast sequence of length H.
To address the issues of information loss and outlier sensitivity, a sliding aggregation is performed on the original sequence before sparse prediction using a 1D convolution with zero-padding and a kernel size of 2×⌊2w⌋+1. The process can be formulated as:
xt−L+1:t(i)=xt−L+1:t(i)+Conv1D(xt−L+1:t(i))
Time series data often exhibit distributional shifts between training and testing datasets. To mitigate this issue, Instance Normalization (IN) is applied, which involves subtracting the mean of the sequence from itself before it enters the model and adding it back after the model's output. This process is formulated as:
The Mean Squared Error (MSE) is used as the loss function:
L=C1i=1∑Cyt+1:t+H(i)−xˉt+1:t+H(i)22
C is the number of channels
yt+1:t+H(i) is the ground truth
xˉt+1:t+H(i) is the predicted value
The paper provides a theoretical analysis of the SparseTSF model, focusing on its parameter efficiency and the effectiveness of the Sparse technique. Given a historical look-back window length L, a forecast horizon H, and a constant periodicity w, the total number of parameters required for the SparseTSF model is ⌊wL⌋×⌊wH⌋+2×⌊2w⌋+1.
The time series is represented as X(t)=P(t)+T(t), where P(t) is the periodic component and T(t) is the trend component, with P(t)=P(t+w). The Sparse technique transforms the forecasting task into predicting downsampled subsequences: xt+1:t+m′=f(xt−n+1:t′). The SparseTSF model's formulation becomes:
pt+1:t+m′+tt+1:t+m′=f(pt−n+1:t′+tt−n+1:t′)
where, for any i∈[t−n+1:t+m] and j∈[t−n+1:t+m], satisfying:
pi′=pj′
The AutoCorrelation Function (ACF) at lag k is defined as:
ACF(k)=∑t=1N(Xt−μ)2∑t=1N−k(Xt−μ)(Xt+k−μ)
N is the total number of observations
Xt is the value of the series at time t
Xt+k is the value of the series at time t+k
μ is the mean of the series
The experimental setup involves four LTSF datasets: ETTh1, ETTh2, Electricity, and Traffic. Baselines include Informer, Autoformer, Pyraformer, FEDformer, Film, TimesNet, PatchTST, DLinear, and FITS.
SparseTSF ranks within the top two in all scenarios, achieving near state-of-the-art levels with a significantly smaller parameter scale.
Efficiency advantages are demonstrated through static and runtime metrics:
Parameters
Multiply-Accumulate Operations (MACs)
Max Memory
Epoch Time
SparseTSF significantly outperforms other models in terms of parameters and MACs, with over ten times fewer parameters than the next best model. It also outperforms other mainstream models in terms of Max Memory and Epoch Time.
Ablation studies reveal the effectiveness of the Sparse technique. Incorporating the Sparse technique enhances the performance of Linear, Transformer, and GRU models, improving their ability to extract periodic features from data. The Sparse technique strengthens the model's ability to extract periodic features from data.
The hyperparameter w, representing the a priori main period, influences the forecast outcomes. SparseTSF exhibits optimal performance when w=24, aligning with the intrinsic main period of the data.
The generalization capability of a trained SparseTSF model on different datasets with the same principal periodicity is studied. SparseTSF outperforms other models in both similar domain generalization (ETTh2 to ETTh1) and less similar domain generalization (Electricity to ETTh1).
The paper discusses limitations and future work, including scenarios with ultra-long periods and multiple periods. One key research direction involves designing additional modules to enhance SparseTSF's ability, thus achieving a balance between performance and parameter size.
The Sparse technique involves downsampling/upsampling to achieve periodicity/trend decoupling, differentiating it from N-HiTS and OneShotSTL.
In conclusion, SparseTSF is presented as a lightweight model for LTSF, demonstrating competitive performance and strong generalization capabilities, making it suitable for resource-constrained environments.