Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters (2405.00946v2)

Published 2 May 2024 in cs.LG

Abstract: This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data. This technique involves downsampling the original sequences to focus on cross-period trend prediction, effectively extracting periodic features while minimizing the model's complexity and parameter count. Based on this technique, the SparseTSF model uses fewer than 1k parameters to achieve competitive or superior performance compared to state-of-the-art models. Furthermore, SparseTSF showcases remarkable generalization capabilities, making it well-suited for scenarios with limited computational resources, small samples, or low-quality data. The code is publicly available at this repository: https://github.com/lss-1138/SparseTSF.

Citations (15)

Summary

  • The paper introduces SparseTSF, a lightweight model for long-term time series forecasting that employs a sparse technique to achieve competitive performance with <1k parameters.
  • The core innovation, the Sparse technique, simplifies forecasting by decoupling periodicity and trend, downsampling sequences for efficient prediction and feature extraction.
  • SparseTSF is significantly more efficient than other models in parameters and MACs, demonstrating strong generalization for resource-constrained environments.

The paper introduces SparseTSF, a novel, lightweight model designed for long-term time series forecasting (LTSF). The core innovation is the Cross-Period Sparse Forecasting technique (referred to as the Sparse technique), which simplifies forecasting by decoupling periodicity and trend in time series data. SparseTSF downsamples the original sequences to focus on cross-period trend prediction, extracting periodic features while minimizing model complexity and parameter count. The model reportedly achieves performance comparable to state-of-the-art models with fewer than 1,000 parameters and demonstrates strong generalization capabilities, making it suitable for resource-constrained scenarios.

The challenges in LTSF stem from the need to extract extensive temporal dependencies from longer historical windows, leading to complex models with millions of parameters. The key insight of the Sparse technique is to decompose the periodicity and trend of time series data. Periodic patterns are transformed into inter-subsequence dynamics, while trend patterns are reinterpreted as intra-subsequence characteristics.

The main contributions of the paper include:

  • The introduction of the Cross-Period Sparse Forecasting technique.
  • The SparseTSF model, which has fewer than 1,000 parameters.
  • Demonstration of competitive predictive accuracy and robust generalization capabilities.

Related work in LTSF involves Transformer-based models, Convolutional Neural Networks (CNNs), Multilayer Perceptrons (MLPs), and the application of pre-trained LLMs. The CI strategy simplifies the forecasting process by focusing on individual univariate time series.

The SparseTSF model utilizes a single linear layer within the Cross-Period Sparse Forecasting framework. Given a time series xtL+1:t(i)x^{(i)}_{t-L+1:t} with a known periodicity ww, the Sparse technique downsamples the original series into ww subsequences of length n=Lwn=\left \lfloor \frac{L}{w} \right \rfloor. A model with shared parameters is applied to these subsequences for prediction. After prediction, the ww subsequences, each of length m=Hwm=\left \lfloor \frac{H}{w} \right \rfloor, are upsampled back to a complete forecast sequence of length HH.

To address the issues of information loss and outlier sensitivity, a sliding aggregation is performed on the original sequence before sparse prediction using a 1D convolution with zero-padding and a kernel size of 2×w2+12 \times \left \lfloor \frac{w}{2} \right \rfloor + 1. The process can be formulated as:

xtL+1:t(i)=xtL+1:t(i)+Conv1D(xtL+1:t(i))x^{(i)}_{t-L+1:t} = x^{(i)}_{t-L+1:t} + \text{Conv1D}(x^{(i)}_{t-L+1:t})

Time series data often exhibit distributional shifts between training and testing datasets. To mitigate this issue, Instance Normalization (IN) is applied, which involves subtracting the mean of the sequence from itself before it enters the model and adding it back after the model's output. This process is formulated as:

xtL+1:t(i)=xtL+1:t(i)Et(xtL+1:t(i))x^{(i)}_{t-L+1:t} = x^{(i)}_{t-L+1:t} - \mathbb{E}_t(x^{(i)}_{t-L+1:t}), xˉt+1:t+H(i)=xˉt+1:t+H(i)+Et(xtL+1:t(i))\bar{x}^{(i)}_{t+1:t+H} = \bar{x}^{(i)}_{t+1:t+H} + \mathbb{E}_t(x^{(i)}_{t-L+1:t}).

The Mean Squared Error (MSE) is used as the loss function:

L=1Ci=1Cyt+1:t+H(i)xˉt+1:t+H(i)22\mathcal{L} = \frac{1}{C}\sum_{i=1}^{C}\left \| y^{(i)}_{t+1:t+H} - \bar{x}^{(i)}_{t+1:t+H} \right \|_{2}^{2}

  • CC is the number of channels
  • yt+1:t+H(i)y^{(i)}_{t+1:t+H} is the ground truth
  • xˉt+1:t+H(i)\bar{x}^{(i)}_{t+1:t+H} is the predicted value

The paper provides a theoretical analysis of the SparseTSF model, focusing on its parameter efficiency and the effectiveness of the Sparse technique. Given a historical look-back window length LL, a forecast horizon HH, and a constant periodicity ww, the total number of parameters required for the SparseTSF model is Lw×Hw+2×w2+1\left \lfloor \frac{L}{w} \right \rfloor \times \left \lfloor \frac{H}{w} \right \rfloor + 2 \times \left \lfloor \frac{w}{2} \right \rfloor + 1.

The time series is represented as X(t)=P(t)+T(t)X(t) = P(t) + T(t), where P(t)P(t) is the periodic component and T(t)T(t) is the trend component, with P(t)=P(t+w)P(t) = P(t + w). The Sparse technique transforms the forecasting task into predicting downsampled subsequences: xt+1:t+m=f(xtn+1:t)x'_{t+1:t+m} = f(x'_{t-n+1:t}). The SparseTSF model's formulation becomes:

pt+1:t+m+tt+1:t+m=f(ptn+1:t+ttn+1:t)p'_{t+1:t+m}+t'_{t+1:t+m} = f(p'_{t-n+1:t}+t'_{t-n+1:t})

where, for any i[tn+1:t+m]i \in [t-n+1:t+m] and j[tn+1:t+m]j \in [t-n+1:t+m], satisfying:

pi=pjp'_i = p'_j

The AutoCorrelation Function (ACF) at lag kk is defined as:

ACF(k)=t=1Nk(Xtμ)(Xt+kμ)t=1N(Xtμ)2\text{ACF}(k) = \frac{\sum_{t=1}^{N-k} (X_t - \mu)(X_{t+k} - \mu)}{\sum_{t=1}^{N} (X_t - \mu)^2}

  • NN is the total number of observations
  • XtX_t is the value of the series at time tt
  • Xt+kX_{t+k} is the value of the series at time t+kt+k
  • μ\mu is the mean of the series

The experimental setup involves four LTSF datasets: ETTh1, ETTh2, Electricity, and Traffic. Baselines include Informer, Autoformer, Pyraformer, FEDformer, Film, TimesNet, PatchTST, DLinear, and FITS.

SparseTSF ranks within the top two in all scenarios, achieving near state-of-the-art levels with a significantly smaller parameter scale.

Efficiency advantages are demonstrated through static and runtime metrics:

  • Parameters
  • Multiply-Accumulate Operations (MACs)
  • Max Memory
  • Epoch Time

SparseTSF significantly outperforms other models in terms of parameters and MACs, with over ten times fewer parameters than the next best model. It also outperforms other mainstream models in terms of Max Memory and Epoch Time.

Ablation studies reveal the effectiveness of the Sparse technique. Incorporating the Sparse technique enhances the performance of Linear, Transformer, and GRU models, improving their ability to extract periodic features from data. The Sparse technique strengthens the model's ability to extract periodic features from data.

The hyperparameter ww, representing the a priori main period, influences the forecast outcomes. SparseTSF exhibits optimal performance when w=24w=24, aligning with the intrinsic main period of the data.

The generalization capability of a trained SparseTSF model on different datasets with the same principal periodicity is studied. SparseTSF outperforms other models in both similar domain generalization (ETTh2 to ETTh1) and less similar domain generalization (Electricity to ETTh1).

The paper discusses limitations and future work, including scenarios with ultra-long periods and multiple periods. One key research direction involves designing additional modules to enhance SparseTSF's ability, thus achieving a balance between performance and parameter size.

The Sparse technique involves downsampling/upsampling to achieve periodicity/trend decoupling, differentiating it from N-HiTS and OneShotSTL.

In conclusion, SparseTSF is presented as a lightweight model for LTSF, demonstrating competitive performance and strong generalization capabilities, making it suitable for resource-constrained environments.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets