TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model (2409.02322v2)
Abstract: Foundation models, particularly LLMs, have revolutionized text and video processing, yet time series data presents distinct challenges for such approaches due to domain-specific features such as missing values, multi-resolution characteristics, etc. Furthermore, the de-facto autoregressive transformers tend to learn deterministic temporal dependencies within pre-trained data while overlooking inherent uncertainties and lacking integration of physical constraints. In this paper, we introduce TimeDiT, a diffusion transformer model that synergistically combines transformer-based temporal dependency learning with diffusion-based probabilistic sampling. TimeDiT employs a unified masking mechanism to harmonize the training and inference process across diverse tasks while introducing a theoretically grounded, finetuning-free model editing strategy that enables flexible integration of external knowledge during sampling. Acknowledging the challenges of unifying multiple downstream tasks under a single model, our systematic evaluation demonstrates TimeDiT's effectiveness both in fundamental tasks, i.e., forecasting and imputation, through zero-shot/fine-tuning; and in domain tasks, i.e., multi-resolution forecasting, anomaly detection, and data generation, establishing it as a \textit{proto-foundation model} that bridges the gap between general-purpose and domain-specific models.
- Practical approach to asynchronous multivariate time series anomaly detection and localization. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 2485–2494, 2021.
- Gluonts: Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21(116):1–6, 2020. URL http://jmlr.org/papers/v21/19-820.html.
- Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815, 2024.
- A. Asuncion and D. Newman. Uci machine learning repository, 2007.
- Accurate medium-range global weather forecasting with 3d neural networks. Nature, 619(7970):533–538, 2023.
- Spectral temporal graph neural network for multivariate time-series forecasting. Advances in neural information processing systems, 33:17766–17778, 2020.
- A synthetic limit order book dataset for benchmarking forecasting algorithms under distributional shift. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022.
- Estimating treatment effects from irregular time series observations with hidden confounders. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6897–6905, 2023a.
- Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948, 2023b.
- Large scale financial time series forecasting with multi-faceted model. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 472–480, 2023c.
- Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
- Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88, 2022.
- A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023.
- Timevae: A variational auto-encoder for multivariate time series generation. arXiv preprint arXiv:2111.08095, 2021.
- Simmtm: A simple pre-training framework for masked time-series modeling. Advances in Neural Information Processing Systems, 36, 2024.
- Addressing distribution shift in time series forecasting with instance normalization flows. arXiv e-prints, pages arXiv–2401, 2024.
- A. Garza and M. Mergenthaler-Canseco. Timegpt-1. arXiv preprint arXiv:2310.03589, 2023.
- Monash time series forecasting archive. In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
- Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems, 36, 2024.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 387–395, 2018.
- Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. In The 14th Symposium on Educational Advances in Artificial Intelligence (EAAI-24), 2024.
- Empowering time series analysis with large language models: A survey. arXiv preprint arXiv:2402.03182, 2024.
- Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
- Position paper: What can large language models tell us about time series analysis. arXiv preprint arXiv:2402.02713, 2024.
- How does it function? characterizing long-term trends in production serverless workloads. In Proceedings of the 2023 ACM Symposium on Cloud Computing, pages 443–458, 2023.
- Polsird: modeling epidemic spread under intervention policies: analyzing the first wave of covid-19 in the usa. Journal of Healthcare Informatics Research, 5(3):231–248, 2021.
- Ai in healthcare: time-series forecasting using statistical, neural, and ensemble architectures. Frontiers in big data, 3:4, 2020.
- Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems, 33:6696–6707, 2020.
- Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=q6X038vKgU.
- M. Krenn and L. Buffoni. Predicting the future of ai with ai: High-quality link prediction in an exponentially growing knowledge network. Nature machine intelligence, 2023.
- Modeling long-and short-term temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 95–104, 2018.
- Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022a.
- Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In International Conference on Learning Representations (ICLR ’18), 2018.
- Generative time series forecasting with diffusion, denoise, and disentanglement. Advances in Neural Information Processing Systems, 35:23009–23022, 2022b.
- Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=c8P9NQVtmnO.
- Foundation models for time series analysis: A tutorial and survey. arXiv preprint arXiv:2403.14735, 2024.
- Timer: Transformers for time series analysis at scale. arXiv preprint arXiv:2402.02368, 2024.
- VDT: General-purpose video diffusion transformers via mask modeling. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Un0rgm9f04.
- Swat: A water treatment testbed for research and training on ics security. In 2016 international workshop on cyber-physical systems for smart water networks (CySWater), pages 31–36. IEEE, 2016.
- Physics-informed long-sequence forecasting from multi-resolution spatiotemporal data. In IJCAI, pages 2189–2195, 2022.
- A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
- A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations (ICLR ’23), 2023.
- Mu2rest: Multi-resolution recursive spatio-temporal transformer for long-term prediction. In Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16–19, 2022, Proceedings, Part I, page 68–80, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-05932-2. doi: 10.1007/978-3-031-05933-9_6. URL https://doi.org/10.1007/978-3-031-05933-9_6.
- Time-delayed multivariate time series predictions. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pages 325–333. SIAM, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- S2superscriptS2\textbf{S}^{2}S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ip-llm: Semantic space informed prompt learning with llm for time series forecasting. In Forty-first International Conference on Machine Learning, 2024.
- W. Peebles and S. Xie. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
- Improving language understanding by generative pre-training. 2018.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pages 8857–8868. PMLR, 2021.
- Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
- Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3009–3017, 2019.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- High-dimensional multivariate forecasting with low-rank gaussian copula processes. Advances in neural information processing systems, 32, 2019.
- Identifying coordinated accounts on social media through hidden influence and group behaviours. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1441–1451, 2021.
- Denoising diffusion implicit models. In International Conference on Learning Representations.
- Free energy computations: A mathematical perspective. World Scientific, 2010.
- Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2828–2837, 2019.
- Bitsfusion: 1.99 bits weight quantization of diffusion model. arXiv preprint arXiv:2406.04333, 2024a.
- Disdet: Exploring detectability of backdoor attack on diffusion models. arXiv preprint arXiv:2402.02739, 2024b.
- Test: Text prototype aligned embedding to activate llm’s ability for time series. In The Twelfth International Conference on Learning Representations, 2023.
- Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611, 2022.
- Totem: Tokenized time series embeddings for general time series analysis, 2024.
- Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
- N. Tlc. Nyc taxi and limousine commission (tlc) trip record data. URL http://www. nyc. gov/html/tlc/html/about/trip record data. shtml, 2017.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023. URL https://api.semanticscholar.org/CorpusID:257219404.
- Tranad: deep transformer networks for anomaly detection in multivariate time series data. Proceedings of the VLDB Endowment, 15(6):1201–1214, 2022.
- Towards physics-informed deep learning for turbulent flow prediction. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1457–1466, 2020.
- Koopman neural operator forecaster for time-series with temporal distributional shifts. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=kUmdmHxK5N.
- Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592, 2024a.
- Unified training of universal time series forecasting transformers. In Forty-first International Conference on Machine Learning, 2024b.
- Coupled multiwavelet neural operator learning for coupled partial differential equations. ICLR 2023, 2023.
- Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In Proceedings of the 2018 world wide web conference, pages 187–196, 2018.
- Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
- A survey on diffusion models for time series and spatio-temporal data. arXiv preprint arXiv:2404.18886, 2024.
- W. Ye and S. Gao. Spatiotemporal heterogeneities of the associations between human mobility and close contacts with covid-19 infections in the united states. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1–2, 2022.
- Time-series generative adversarial networks. Advances in neural information processing systems, 32, 2019.
- X. Yuan and Y. Qiao. Diffusion-ts: Interpretable diffusion for general time series generation. arXiv preprint arXiv:2403.01742, 2024.
- Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8980–8987, 2022.
- Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
- A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 2114–2124, 2021.
- Self-supervised learning for time series analysis: Taxonomy, progress, and prospects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024a.
- Vigdet: Knowledge informed neural temporal point process for coordination detection on social media. Advances in Neural Information Processing Systems, 34:3218–3231, 2021.
- Counterfactual neural temporal point process for estimating causal influence of misinformation on social media. Advances in Neural Information Processing Systems, 35:10643–10655, 2022.
- Multi-scale transformer pyramid networks for multivariate time series forecasting. IEEE Access, 2024b.
- Multivariate time-series anomaly detection via graph attention network. In 2020 IEEE international conference on data mining (ICDM), pages 841–850. IEEE, 2020.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI, 2021.
- One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems, 36:43322–43355, 2023.