Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLMFormer: Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System (2207.07827v4)

Published 16 Jul 2022 in cs.LG and cs.CV

Abstract: Long-term time-series forecasting (LTSF) plays a crucial role in various practical applications. Transformer and its variants have become the de facto backbone for LTSF, offering exceptional capabilities in processing long sequence data. However, existing Transformer-based models, such as Fedformer and Informer, often achieve their best performances on validation sets after just a few epochs, indicating potential underutilization of the Transformer's capacity. One of the reasons that contribute to this overfitting is data redundancy arising from the rolling forecasting settings in the data augmentation process, particularly evident in longer sequences with highly similar adjacent data. In this paper, we propose a novel approach to address this issue by employing curriculum learning and introducing a memory-driven decoder. Specifically, we progressively introduce Bernoulli noise to the training samples, which effectively breaks the high similarity between adjacent data points. To further enhance forecasting accuracy, we introduce a memory-driven decoder. This component enables the model to capture seasonal tendencies and dependencies in the time-series data and leverages temporal relationships to facilitate the forecasting process. The experimental results on six real-life LTSF benchmarks demonstrate that our approach can be seamlessly plugged into varying Transformer-based models, with our approach enhancing the LTSF performances of various Transformer-based models by maximally 30%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Y. Tang, Z. Song, Y. Zhu, H. Yuan, M. Hou, J. Ji, C. Tang, and J. Li, “A survey on machine learning models for financial time series forecasting,” Neurocomputing, vol. 512, pp. 363–380, 2022.
  2. K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Accurate medium-range global weather forecasting with 3d neural networks,” Nature, pp. 1–6, 2023.
  3. G. Raman, B. Ashraf, Y. K. Demir, C. D. Kershaw, S. Cheruku, M. Atis, A. Atis, M. Atar, W. Chen, I. Ibrahim, T. Bat, and M. Mete, “Machine learning prediction for COVID-19 disease severity at hospital admission,” BMC Medical Informatics Decis. Mak., vol. 23, no. 1, p. 46, 2023.
  4. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
  5. M. Li, P.-Y. Huang, X. Chang, J. Hu, Y. Yang, and A. Hauptmann, “Video pivoting unsupervised multi-modal machine translation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  6. X. Lin, S. Sun, W. Huang, B. Sheng, P. Li, and D. D. Feng, “Eapt: efficient attention pyramid transformer for image processing,” IEEE Transactions on Multimedia, 2021.
  7. A. Yang, S. Lin, C.-H. Yeh, M. Shu, Y. Yang, and X. Chang, “Context matters: Distilling knowledge graph for enhanced object detection,” IEEE Transactions on Multimedia, 2023.
  8. J. Liu, W. Wang, S. Chen, X. Zhu, and J. Liu, “Sounding video generator: A unified framework for text-guided sounding video generation,” IEEE Transactions on Multimedia, 2023.
  9. Y. Su, J. Deng, R. Sun, G. Lin, H. Su, and Q. Wu, “A unified transformer framework for group-based segmentation: Co-segmentation, co-saliency detection and video salient object detection,” IEEE Transactions on Multimedia, 2023.
  10. M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learning for chest x-ray report generation,” arXiv preprint arXiv:2303.10323, 2023.
  11. M. Li, R. Liu, F. Wang, X. Chang, and X. Liang, “Auxiliary signal-guided knowledge encoder-decoder for medical report generation,” World Wide Web, vol. 26, no. 1, pp. 253–270, 2023.
  12. H. Cao, Z. Huang, T. Yao, J. Wang, H. He, and Y. Wang, “Inparformer: Evolutionary decomposition transformers with interactive parallel attention for long-term time series forecasting,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 6, 2023, pp. 6906–6915.
  13. Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023.   OpenReview.net, 2023. [Online]. Available: https://openreview.net/pdf?id=Jbdc0vTOcol
  14. T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting,” in International Conference on Machine Learning.   PMLR, 2022, pp. 27 268–27 286.
  15. H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in AAAI, 2021.
  16. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 1997.
  17. Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell, “A dual-stage attention-based recurrent neural network for time series prediction,” in IJCAI, 2017.
  18. R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka, “A multi-horizon quantile recurrent forecaster,” arXiv:1711.11053, 2018.
  19. H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 419–22 430, 2021.
  20. J. Wang, S. Qian, J. Hu, and R. Hong, “Positive unlabeled fake news detection via multi-modal masked transformer network,” IEEE Transactions on Multimedia, 2023.
  21. Y. Du, M. Wang, W. Zhou, and H. Li, “Progressive similarity preservation learning for deep scalable product quantization,” IEEE Transactions on Multimedia, 2023.
  22. J. Pan, S. Yang, L. G. Foo, Q. Ke, H. Rahmani, Z. Fan, and J. Liu, “Progressive channel-shrinking network,” IEEE Transactions on Multimedia, 2023.
  23. K. Benidis, S. S. Rangapuram, V. Flunkert, Y. Wang, D. C. Maddix, A. C. Türkmen, J. Gasthaus, M. Bohlke-Schneider, D. Salinas, L. Stella, F. Aubet, L. Callot, and T. Januschowski, “Deep learning for time series forecasting: Tutorial and literature survey,” ACM Comput. Surv., vol. 55, no. 6, pp. 121:1–121:36, 2023.
  24. G. E. Box and G. M. Jenkins, “Some recent advances in forecasting and control,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 17, no. 2, pp. 91–109, 1968.
  25. E. S. Gardner Jr, “Exponential smoothing: The state of the art,” Journal of forecasting, vol. 4, no. 1, pp. 1–28, 1985.
  26. R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning, “Stl: A seasonal-trend decomposition,” J. Off. Stat, vol. 6, no. 1, pp. 3–73, 1990.
  27. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint arXiv:1409.1259, 2014.
  28. D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, “Deepar: Probabilistic forecasting with autoregressive recurrent networks,” International Journal of Forecasting, vol. 36, no. 3, pp. 1181–1191, 2020.
  29. S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” Advances in neural information processing systems, vol. 28, 2015.
  30. Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, “Transformers in time series: A survey,” arXiv preprint arXiv:2202.07125, 2022.
  31. M. Jin, G. Shi, Y.-F. Li, Q. Wen, B. Xiong, T. Zhou, and S. Pan, “How expressive are spectral-temporal graph neural networks for time series forecasting?” arXiv preprint arXiv:2305.06587, 2023.
  32. S. Wu, X. Xiao, Q. Ding, P. Zhao, Y. Wei, and J. Huang, “Adversarial sparse transformer for time series forecasting,” in NeurIPS, 2020.
  33. A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 9, 2023, pp. 11 121–11 128.
  34. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL, 2018.
  35. C. M. Bishop, “Training with noise is equivalent to tikhonov regularization,” Neural Computation, vol. 7, pp. 108–116, 1995.
  36. S. Wager, S. I. Wang, and P. Liang, “Dropout training as adaptive regularization,” in NeurIPS, 2013.
  37. S. Zhai and Z. Zhang, “Dropout training of matrix factorization and autoencoder for link prediction in sparse graphs,” in SDM, 2015.
  38. P. Morerio, J. Cavazza, R. Volpi, R. Vidal, and V. Murino, “Curriculum dropout,” in ICCV, 2017.
  39. C. Ma, C. Shen, A. R. Dick, Q. Wu, P. Wang, A. van den Hengel, and I. D. Reid, “Visual question answering with memory-augmented networks,” in CVPR, 2018.
  40. C. Ma, L. Ma, Y. Zhang, J. Sun, X. Liu, and M. Coates, “Memory augmented graph neural networks for sequential recommendation,” in AAAI, 2020.
  41. Z. Fei, “Memory-augmented image captioning,” in AAAI, 2021.
  42. D. Xu, W. Cheng, B. Zong, D. Song, J. Ni, W. Yu, Y. Liu, H. Chen, and X. Zhang, “Tensorized lstm with adaptive shared memory for learning trends in multivariate time series,” in AAAI, 2020.
  43. M. Jiang, J. Wu, X. Shi, and M. Zhang, “Transformer based memory network for sentiment analysis of web comments,” IEEE Access, vol. 7, pp. 179 942–179 953, 2019.
  44. A. Banino, A. P. Badia, R. Köster, M. J. Chadwick, V. F. Zambaldi, D. Hassabis, C. Barry, M. Botvinick, D. Kumaran, and C. Blundell, “MEMO: A deep network for flexible combination of episodic memories,” in ICLR, 2020.
  45. Z. Chen, Y. Song, T.-H. Chang, and X. Wan, “Generating radiology reports via memory-driven transformer,” in EMNLP, 2020.
  46. X. Ma, Y. Wang, M. J. Dousti, P. Koehn, and J. Pino, “Streaming simultaneous speech translation with augmented memory transformer,” in ICASSP, 2021.
  47. T. Zhou, Z. Ma, X. Wang, Q. Wen, L. Sun, T. Yao, W. Yin, and R. Jin, “Film: Frequency improved legendre memory model for long-term time series forecasting,” in NeurIPS, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Mingjie Li (67 papers)
  2. Rui Liu (320 papers)
  3. Guangsi Shi (6 papers)
  4. Mingfei Han (15 papers)
  5. Changling Li (4 papers)
  6. Lina Yao (194 papers)
  7. Xiaojun Chang (148 papers)
  8. Ling Chen (144 papers)
Citations (1)