Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning (2401.17802v2)
Abstract: Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting. Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences. Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series to learn the underlying structure feature on the unlabeled time series. Meanwhile, we design a supervised task to learn more robust representations and facilitate the contrastive learning process. Finally, we jointly optimize the above two tasks. By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of DE-TSMCL, where the maximum improvement can reach to 27.3%.
- An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 .
- Emerging properties in self-supervised vision transformers. 2021 ieee, in: CVF International Conference on Computer Vision (ICCV).
- Online knowledge distillation with diverse peers, in: Proceedings of the AAAI conference on artificial intelligence, pp. 3430–3437.
- A simple framework for contrastive learning of visual representations, in: International conference on machine learning, PMLR. pp. 1597–1607.
- Feature-map-level online adversarial knowledge distillation, in: International Conference on Machine Learning, PMLR. pp. 2006–2015.
- Time series change point detection with self-supervised contrastive predictive coding, in: Proceedings of the Web Conference 2021, pp. 3124–3135.
- Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112 .
- Self-supervised time series representation learning by inter-intra relational reasoning. arXiv preprint arXiv:2011.13548 .
- Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems 32.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284.
- Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738.
- Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 .
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 .
- Modeling long-and short-term temporal patterns with deep neural networks, in: The 41st international ACM SIGIR conference on research & development in information retrieval, pp. 95–104.
- Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems 32.
- Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing. arXiv preprint arXiv:2302.04501 .
- Improving text-based early prediction by distillation from privileged time-series text. arXiv preprint arXiv:2301.10887 .
- Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26.
- Self-distillation amplifies regularization in hilbert space. Advances in Neural Information Processing Systems 33, 3351–3361.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 .
- N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437 .
- Contrastive learning for unsupervised domain adaptation of time series. arXiv preprint arXiv:2206.06243 .
- Learning transferable visual models from natural language supervision, in: International conference on machine learning, PMLR. pp. 8748–8763.
- Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278 .
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36, 1181–1191.
- Unsupervised representation learning for time series with temporal neighborhood coding. arXiv preprint arXiv:2106.00750 .
- Multivariate temporal convolutional network: A deep neural networks approach for multivariate time series forecasting. Electronics 8, 876.
- Revisiting the transferability of supervised pretraining: an mlp perspective, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9183–9193.
- A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053 .
- Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575 .
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems 34, 22419–22430.
- Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121 .
- Mutual contrastive learning for visual representation learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3045–3053.
- Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion, in: International Conference on Machine Learning, PMLR. pp. 25038–25054.
- Graph contrastive learning with augmentations. Advances in neural information processing systems 33, 5812–5823.
- Ts2vec: Towards universal representation of time series, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8980–8987.
- Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50, 159–175.
- Sleeppriorcl: Contrastive representation learning with prior knowledge-based positive mining and adaptive temperature for sleep staging. arXiv preprint arXiv:2110.09966 .
- Self-supervised contrastive pre-training for time series via time-frequency consistency. Advances in Neural Information Processing Systems 35, 3988–4003.
- Deep mutual learning, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4320–4328.
- Informer: Beyond efficient transformer for long sequence time-series forecasting, in: Proceedings of the AAAI conference on artificial intelligence, pp. 11106–11115.
- Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, in: International Conference on Machine Learning, PMLR. pp. 27268–27286.
- Graph contrastive learning with adaptive augmentation, in: Proceedings of the Web Conference 2021, pp. 2069–2080.
- Haozhi Gao (1 paper)
- Qianqian Ren (8 papers)
- Jinbao Li (9 papers)