Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Multivariate Time Series Forecasting with Mutual Information-driven Cross-Variable and Temporal Modeling (2403.00869v1)

Published 1 Mar 2024 in cs.LG and stat.ML

Abstract: Recent advancements have underscored the impact of deep learning techniques on multivariate time series forecasting (MTSF). Generally, these techniques are bifurcated into two categories: Channel-independence and Channel-mixing approaches. Although Channel-independence methods typically yield better results, Channel-mixing could theoretically offer improvements by leveraging inter-variable correlations. Nonetheless, we argue that the integration of uncorrelated information in channel-mixing methods could curtail the potential enhancement in MTSF model performance. To substantiate this claim, we introduce the Cross-variable Decorrelation Aware feature Modeling (CDAM) for Channel-mixing approaches, aiming to refine Channel-mixing by minimizing redundant information between channels while enhancing relevant mutual information. Furthermore, we introduce the Temporal correlation Aware Modeling (TAM) to exploit temporal correlations, a step beyond conventional single-step forecasting methods. This strategy maximizes the mutual information between adjacent sub-sequences of both the forecasted and target series. Combining CDAM and TAM, our novel framework significantly surpasses existing models, including those previously considered state-of-the-art, in comprehensive tests.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410, 2016.
  2. Temporal-clustering invariance in irregular healthcare time series. arXiv preprint arXiv:1904.12206, 2019.
  3. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in neural information processing systems, 33:17804–17815, 2020.
  4. Mutual information neural estimation. In International conference on machine learning, pp.  531–540. PMLR, 2018.
  5. Time series analysis: forecasting and control. John Wiley & Sons, 2015.
  6. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Biocomputing 2000, pp.  418–429. World Scientific, 1999.
  7. Freeway performance measurement system: mining loop detector data. Transportation Research Record, 1748(1):96–102, 2001.
  8. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  12270–12280, 2021.
  9. Club: A contrastive log-ratio upper bound of mutual information. In International conference on machine learning, pp.  1779–1788. PMLR, 2020.
  10. Learning robust representations via multi-view information bottleneck. arXiv preprint arXiv:2002.07017, 2020.
  11. Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems, 32, 2019.
  12. Infobot: Transfer and exploration via the information bottleneck. arXiv preprint arXiv:1901.10902, 2019.
  13. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  14. Generalization in reinforcement learning with selective noise injection and information bottleneck. Advances in neural information processing systems, 32, 2019.
  15. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2021.
  16. Input feature selection by mutual information based on parzen window. IEEE transactions on pattern analysis and machine intelligence, 24(12):1667–1671, 2002.
  17. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pp.  95–104, 2018.
  18. Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp.  1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
  19. Temporal convolutional networks for action segmentation and detection. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  156–165, 2017.
  20. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
  21. Petformer: Long-term time series forecasting via placeholder-enhanced transformer. arXiv preprint arXiv:2308.04791, 2023.
  22. Structured inference for recurrent hidden semi-markov model. In Lang, J. (ed.), Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pp.  2447–2453, 2018.
  23. Scinet: Time series modeling and forecasting with sample convolution and interaction. Advances in Neural Information Processing Systems, 35:5816–5828, 2022a.
  24. Non-stationary transformers: Exploring the stationarity in time series forecasting. Advances in Neural Information Processing Systems, 35:9881–9893, 2022b.
  25. Significance-aware information bottleneck for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  6778–6787, 2019.
  26. Variational information bottleneck for effective low-resource fine-tuning. arXiv preprint arXiv:2106.05469, 2021.
  27. Univariate vs multivariate time series forecasting with transformers. 2022.
  28. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
  29. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  30. On variational bounds of mutual information. In International Conference on Machine Learning, pp.  5171–5180. PMLR, 2019.
  31. Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018.
  32. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  33. Restricting the flow: Information bottlenecks for attribution. arXiv preprint arXiv:2001.00396, 2020.
  34. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
  35. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pp.  1–5. IEEE, 2015.
  36. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  37. The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. arXiv preprint arXiv:1909.01380, 2019.
  38. Bottlesum: Unsupervised and self-supervised sentence summarization using the information bottleneck principle. arXiv preprint arXiv:1909.07405, 2019.
  39. Are transformers effective for time series forecasting? arXiv preprint arXiv:2205.13504, 2022.
  40. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations.
  41. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  11106–11115, 2021.
  42. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pp.  27268–27286. PMLR, 2022.

Summary

We haven't generated a summary for this paper yet.