FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting (2201.12740v3)

Published 30 Jan 2022 in cs.LG and stat.ML

Abstract: Although Transformer-based methods have significantly improved state-of-the-art results for long-term series forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer ({\bf FEDformer}), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, FEDformer can reduce prediction error by $14.8\%$ and $22.6\%$ for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.

PDF Abstract

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

The paper presents FEDformer, a novel architecture designed to improve the performance of Transformer-based models for long-term time series forecasting. The proposed FEDformer framework addresses two major limitations of conventional Transformers: their inability to capture the global view of time series data and their high computational complexity due to quadratic scaling with the sequence length. The key innovation in FEDformer is its integration of seasonal-trend decomposition and frequency domain analysis into the Transformer architecture, which enables it to better capture global patterns and efficiently manage long sequences.

Key Contributions

Frequency Enhanced Decomposed Transformer (FEDformer) Architecture: FEDformer introduces a hybrid architecture wherein the seasonal-trend decomposition method captures the global profile of the time series, while Transformers capture more detailed structures. This architecture is further enhanced by incorporating frequency domain representations, specifically using Fourier and Wavelet transforms to capture important structures in the data.
Fourier and Wavelet Enhanced Blocks: The authors introduce Frequency Enhanced Blocks (FEBs) and Frequency Enhanced Attention (FEA) blocks, which replace the standard self-attention and cross-attention mechanisms in the Transformer with operations performed in the frequency domain. FEBs and FEAs leverage randomly selected subsets of Fourier or Wavelet components, enabling linear computational complexity and efficient handling of long sequences.
Random Fourier Component Selection: To improve computational efficiency, the authors propose a method for selecting a random subset of frequency components based on the sparsity of time series representations in well-known bases such as the Fourier transform. Theoretical analysis and empirical validation show that this approach yields a better representation of the time series and reduces computational costs.
Comprehensive Evaluation: Extensive experiments were conducted on six benchmark datasets across various domains (energy, traffic, economics, weather, and disease). The results consistently demonstrate that FEDformer achieves superior performance compared to state-of-the-art methods, with significant reductions in prediction errors for both multivariate and univariate time series forecasting.

Experimental Results

The empirical studies show that FEDformer reduces prediction errors by 14.8% and 22.6% for multivariate and univariate time series forecasting, respectively, compared to state-of-the-art methods. FEDformer achieves these improvements while maintaining linear complexity relative to sequence length, which is particularly beneficial for long-term forecasting tasks.

Model Design and Theoretical Analysis

Seasonal-Trend Decomposition

FEDformer begins by employing a seasonal-trend decomposition to separate the time series into trend and seasonal components. This decomposition ensures that the long-term trend and repetitive seasonal patterns are adequately captured, facilitating more accurate forecasting.

Frequency Domain Operations

The core novelty of FEDformer lies in its use of frequency domain operations. The FEBs and FEAs apply Fourier and Wavelet transforms to convert the time series data into the frequency domain, where key frequency components are identified and used to perform attention mechanisms. This approach simplifies the attention calculations, as the data in the frequency domain often exhibits a sparse structure that allows for efficient computation.

Low-rank Approximation

The frequency domain representation significantly reduces the dimensionality of the data, allowing for low-rank approximation of the attention matrices. This reduction not only enhances computational efficiency but also mitigates the risk of overfitting by focusing on the most informative components of the time series.

Implications and Future Directions

The integration of frequency domain analysis into the Transformer architecture represents a significant advancement in time series forecasting. By effectively combining seasonal-trend decomposition with frequency enhanced attention mechanisms, FEDformer offers a robust and scalable solution for long-term forecasting tasks.

Practical Implications

Scalability: FEDformer’s linear complexity ensures that it can handle large datasets and long sequences without prohibitive computational costs.
Accuracy: The empirical results underline the model's superior predictive accuracy across various types of time series data, making it a versatile tool for multiple domains.

Theoretical Implications and Future Developments

Sparse Representations: The success of randomly selecting Fourier components underscores the potential of leveraging sparse representations in other contexts within machine learning.
Hybrid Architectures: The combination of decomposition methods with advanced attention mechanisms opens new avenues for research in hybrid model architectures.

Looking ahead, further exploration into different types of transformations and decomposition methods could yield even more efficient and accurate forecasting models. Additionally, adapting the principles of FEDformer to other sequence-based tasks, such as natural language processing, could be a promising direction for future research.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Tian Zhou (57 papers)
Ziqing Ma (10 papers)
Qingsong Wen (139 papers)
Xue Wang (69 papers)
Liang Sun (124 papers)
Rong Jin (164 papers)

Citations (1,009)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - MAZiqing/FEDformer (591 stars)