Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt (2403.14949v1)
Abstract: Online updating of time series forecasting models aims to tackle the challenge of concept drifting by adjusting forecasting models based on streaming data. While numerous algorithms have been developed, most of them focus on model design and updating. In practice, many of these methods struggle with continuous performance regression in the face of accumulated concept drifts over time. To address this limitation, we present a novel approach, Concept \textbf{D}rift \textbf{D}etection an\textbf{D} \textbf{A}daptation (D3A), that first detects drifting conception and then aggressively adapts the current model to the drifted concepts after the detection for rapid adaption. To best harness the utility of historical data for model adaptation, we propose a data augmentation strategy introducing Gaussian noise into existing training instances. It helps mitigate the data distribution gap, a critical factor contributing to train-test performance inconsistency. The significance of our data augmentation process is verified by our theoretical analysis. Our empirical studies across six datasets demonstrate the effectiveness of D3A in improving model adaptation capability. Notably, compared to a simple Temporal Convolutional Network (TCN) baseline, D3A reduces the average Mean Squared Error (MSE) by $43.9\%$. For the state-of-the-art (SOTA) model, the MSE is reduced by $33.3\%$.
- Latent space autoregression for novelty detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 481–490, 2019.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pages 139–154, 2018.
- Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems 32, pages 11849–11860. 2019.
- Task-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11254–11263, 2019.
- Online learning for time series prediction. In Conference on learning theory, pages 172–184. PMLR, 2013.
- Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statistical Association, 65(332):1509–1526, 1970.
- Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000.
- Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Long short-term memory. Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012.
- Siesta: Efficient online continual learning with sleep. arXiv preprint arXiv:2303.10725, 2023.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
- Ddg-da: Data distribution generation for predictable concept drift adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4092–4100, 2022.
- Multivariate time series anomaly detection and interpretation using hierarchical inter-metric and temporal embedding. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 3220–3230, 2021.
- Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021.
- Online arima algorithms for time series prediction. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475, 2020.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- A time series is worth 64 words: Long-term forecasting with transformers. ICLR, 2023.
- Learning fast and slow for online time series forecasting. ICLR, 2023.
- Generalizing to evolving domains with latent structure-aware sequential autoencoder. In International Conference on Machine Learning, pages 18062–18082. PMLR, 2022.
- A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971, 2017.
- Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471, 2001.
- Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. Advances in neural information processing systems, 32, 2019.
- Support vector data description. Machine learning, 54:45–66, 2004.
- Alexey Tsymbal. The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin, 106(2):58, 2004.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Vim: Out-of-distribution with virtual-logit matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4921–4930, 2022.
- Transformers in time series: A survey. arXiv preprint arXiv:2202.07125, 2022.
- Likelihood regret: An out-of-distribution detection score for variational auto-encoder. Advances in neural information processing systems, 33:20685–20696, 2020.
- Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
- Learning to learn the future: Modeling concept drifts in time series prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 2434–2443, 2021.
- Model-free test time adaptation for out-of-distribution detection. arXiv preprint arXiv:2311.16420, 2023.
- Onenet: Enhancing time series forecasting models under concept drift by online ensembling. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
- Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pages 27268–27286. PMLR, 2022.
- Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
- Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International conference on learning representations, 2018.