Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks (2401.09093v1)

Published 17 Jan 2024 in cs.LG

Abstract: Traditional Recurrent Neural Network (RNN) architectures, such as LSTM and GRU, have historically held prominence in time series tasks. However, they have recently seen a decline in their dominant position across various time series tasks. As a result, recent advancements in time series forecasting have seen a notable shift away from RNNs towards alternative architectures such as Transformers, MLPs, and CNNs. To go beyond the limitations of traditional RNNs, we design an efficient RNN-based model for time series tasks, named RWKV-TS, with three distinctive features: (i) A novel RNN architecture characterized by $O(L)$ time complexity and memory usage. (ii) An enhanced ability to capture long-term sequence information compared to traditional RNNs. (iii) High computational efficiency coupled with the capacity to scale up effectively. Through extensive experimentation, our proposed RWKV-TS model demonstrates competitive performance when compared to state-of-the-art Transformer-based or CNN-based models. Notably, RWKV-TS exhibits not only comparable performance but also demonstrates reduced latency and memory utilization. The success of RWKV-TS encourages further exploration and innovation in leveraging RNN-based approaches within the domain of Time Series. The combination of competitive performance, low latency, and efficient memory usage positions RWKV-TS as a promising avenue for future research in time series tasks. Code is available at:\href{https://github.com/howard-hou/RWKV-TS}{ https://github.com/howard-hou/RWKV-TS}

Overview of RWKV-TS: A Novel RNN Architecture for Time Series Tasks

The paper "RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks" introduces a refined recurrent neural network (RNN) model named RWKV-TS, designed to address the existing limitations of conventional RNN architectures when applied to time series analysis. The model redefines the conventional use of RNNs by optimizing their structure to achieve competitive performance on various time-series tasks while maintaining computational efficiency and scalability.

Strengths and Features of RWKV-TS

This paper explores the following three innovative features of RWKV-TS:

  1. Efficient Architecture: The RWKV-TS is developed with a new RNN structure that maintains O(L)O(L) time complexity and memory usage, where LL is the sequence length. This efficiency enables RWKV-TS to perform parallel computations, which is a significant advancement over traditional RNN models like LSTM and GRU, which are often constrained by their sequential nature and cannot be easily parallelized.
  2. Long-Term Dependency Handling: The RWKV-TS model is designed to capture long-term dependencies in time series more effectively than traditional RNNs, which typically suffer from the vanishing/exploding gradient problem. This improvement is a consequence of the sophisticated token shift and time-decay mechanisms embedded within the model.
  3. Scalability and Performance: RWKV-TS effectively scales with larger datasets and exhibits performance on par with state-of-the-art models based on transformers and convolutional neural networks (CNNs). It demonstrates lower latency and reduced memory usage, which is critical for deploying models in resource-constrained environments.

Empirical Evaluations

The paper provides a comprehensive empirical evaluation across several crucial time-series tasks including:

  • Long-term and Short-term Forecasting: RWKV-TS performs comparably to, or better than, prominent models like PatchTST and TimesNet in long-term forecasting tasks. In short-term forecasting on the M4 dataset, the model sustains competitive accuracies close to or exceeding other methodologies.
  • Few-shot Learning: In few-shot learning scenarios, RWKV-TS outperforms popular models such as DLinear and TimesNet, demonstrating robust feature extraction capabilities with limited data.
  • Classification and Anomaly Detection: RWKV-TS yields noteworthy performance in time series classification and anomaly detection, again matching or exceeding benchmark methods in accuracy and efficiency.
  • Time Series Imputation: While RWKV-TS's unidirectional nature may slightly limit its performance in imputation tasks compared to bidirectional models, it still surpasses several baselines, suggesting potential for further refinement.

Implications for Future Research

The development of RWKV-TS marks an important step in revisiting the utility of RNNs for time series analysis. The introduction of a model capable of combining the strengths of RNNs with the efficiency and scale benefits typically associated with transformer-based models could reignite interest in advanced RNN architectures. Future work might explore bidirectional and hybrid models that preserve the benefits of RWKV-TS but further enhance its applicability to tasks involving bidirectional dependencies, such as data imputation.

In conclusion, the RWKV-TS model broadens the landscape for RNN applications in time series analysis by merging performance proficiency with computational efficiency. This model challenges existing perceptions of RNN utility in handling long-range dependencies, laying the groundwork for future exploration and optimization of RNN-based architectures in diverse temporal domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Practical approach to asynchronous multivariate time series anomaly detection and localization. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 2485–2494, 2021.
  2. The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075, 2018.
  3. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, 2018.
  4. C2far: Coarse-to-fine autoregressive networks for precise probabilistic forecasting. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 21900–21915. Curran Associates, Inc., 2022.
  5. N-hits: Neural hierarchical interpolation for time series forecasting. arXiv preprint arXiv:2201.12886, 2022.
  6. Xgboost: A scalable tree boosting system. KDD ’16, 2016.
  7. Towards better forecasting by fusing near and distant future visions. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3593–3600, Apr. 2020.
  8. Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014.
  9. Long-term forecasting with tide: Time-series dense encoder, 2023.
  10. ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery, 34(5):1454–1495, 2020.
  11. Unsupervised scalable representation learning for multivariate time series. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  12. Mamba: Linear-time sequence modeling with selective state spaces. ArXiv, abs/2312.00752, 2023.
  13. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
  14. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
  15. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 387–395, 2018.
  16. Scaling laws for neural language models. ArXiv, abs/2001.08361, 2020.
  17. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022.
  18. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018.
  19. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  20. Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing, 2023.
  21. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International conference on learning representations, 2021.
  22. Scinet: Time series modeling and forecasting with sample convolution and interaction. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 5816–5828. Curran Associates, Inc., 2022.
  23. Non-stationary transformers: Exploring the stationarity in time series forecasting. In Advances in Neural Information Processing Systems, 2022.
  24. The m4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34(4):802–808, 2018.
  25. Swat: A water treatment testbed for research and training on ics security. In 2016 international workshop on cyber-physical systems for smart water networks (CySWater), pages 31–36. IEEE, 2016.
  26. A time series is worth 64 words: Long-term forecasting with transformers. ArXiv, abs/2211.14730, 2022.
  27. N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437, 2019.
  28. Rwkv: Reinventing rnns for the transformer era. In Conference on Empirical Methods in Natural Language Processing, 2023.
  29. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2828–2837, 2019.
  30. Retentive network: A successor to transformer for large language models. ArXiv, abs/2307.08621, 2023.
  31. Neural differential recurrent neural network with adaptive time steps, 2023.
  32. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2023.
  33. Micn: Multi-scale local and global context modeling for long-term series forecasting. In International Conference on Learning Representations, 2023.
  34. A multi-horizon quantile recurrent forecaster, 2018.
  35. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022.
  36. Group normalization. International Journal of Computer Vision, 128:742 – 755, 2018.
  37. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Advances in Neural Information Processing Systems (NeurIPS), pages 101–112, 2021.
  38. Timesnet: Temporal 2d-variation modeling for general time series analysis. In International Conference on Learning Representations, 2023.
  39. Timesnet: Temporal 2d-variation modeling for general time series analysis. In The Eleventh International Conference on Learning Representations, 2023.
  40. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
  41. Are transformers effective for time series forecasting? Proceedings of the AAAI Conference on Artificial Intelligence, 37(9):11121–11128, Jun. 2023.
  42. Less is more: Fast multivariate time series forecasting with light sampling-oriented mlp structures. arXiv preprint arXiv:2207.01186, 2022.
  43. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI, 2021.
  44. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (ICML 2022), 2022.
  45. One fits all:power general time series analysis by pretrained lm. 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Haowen Hou (15 papers)
  2. F. Richard Yu (47 papers)
Citations (15)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com