Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting (2307.00493v3)
Abstract: We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention being local is a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 1.6 to 2 times. We also provide a mathematical definition of FWin attention, and prove that it is equivalent to the canonical full attention under the block diagonal invertibility (BDI) condition of the attention matrix. The BDI is shown experimentally to hold with high probability for typical benchmark datasets.
- Bracewell, R. N. The Hartley Transform. Oxford University Press, New York, 1986.
- Power system toolbox. Available at https://www.ecse.rpi.edu/~chowj/, 2008. Accessed: 2023-07-19.
- Efficient token mixing for transformers via adaptive Fourier neural operators. International Conference on Learning Representations, 2022.
- Modeling long- and short-term temporal patterns with deep neural networks, 2018.
- Fnet: Mixing tokens with Fourier transforms. arXiv:2105.03824, 2021.
- Machine-learning-based online transient analysis via iterative computation of generator dynamics. in Proc. IEEE SmartGridComm, 2020.
- Generalizable memory-driven transformer for multivariate long sequence time-series forecasting. arXiv preprint arXiv:2207.07827, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019, 2022.
- Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. ICLR, 2022.
- Nadaraya, E. On estimating regression. Theory of Probability and its Applications, 9:141–142, 1964.
- Fourierformer: Transformer meets generalized Fourier integral theorem. Advances in Neural Information Processing Systems, 2022.
- A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Jbdc0vTOcol.
- Parzen, E. On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33:1065–1076, 1962.
- Global filter networks for image classification. Advances in Neural Information Processing Systems, 34:980–993, 2021.
- Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, pp. 832–837, 1956.
- Mlp-mixer: An all-mlp architecture for vision. arXiv preprint arXiv:2105.01601, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Etsformer: Exponential smoothing transformers for time-series forecasting, 2022.
- Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, 2021.
- Moat: Alternating mobile convolution and attention brings strong vision models. arXiv preprint arXiv:2210.01820, 2022.
- Glassoformer: a query-sparse transformer for post-fault power grid voltage prediction. Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3968–3972, 2022.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. in Proc. of the Association for the Advancement of Artificial Intelligence, 35:11106–11115, 2021.
- Expanding the prediction capacity in long sequence time-series forecasting. Artificial Intelligence, 318:103886, 2023. ISSN 0004-3702.
- Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. International Conference on Machine Learning, 2022.