Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model (2409.02322v2)

Published 3 Sep 2024 in cs.LG and cs.AI

Abstract: Foundation models, particularly LLMs, have revolutionized text and video processing, yet time series data presents distinct challenges for such approaches due to domain-specific features such as missing values, multi-resolution characteristics, etc. Furthermore, the de-facto autoregressive transformers tend to learn deterministic temporal dependencies within pre-trained data while overlooking inherent uncertainties and lacking integration of physical constraints. In this paper, we introduce TimeDiT, a diffusion transformer model that synergistically combines transformer-based temporal dependency learning with diffusion-based probabilistic sampling. TimeDiT employs a unified masking mechanism to harmonize the training and inference process across diverse tasks while introducing a theoretically grounded, finetuning-free model editing strategy that enables flexible integration of external knowledge during sampling. Acknowledging the challenges of unifying multiple downstream tasks under a single model, our systematic evaluation demonstrates TimeDiT's effectiveness both in fundamental tasks, i.e., forecasting and imputation, through zero-shot/fine-tuning; and in domain tasks, i.e., multi-resolution forecasting, anomaly detection, and data generation, establishing it as a \textit{proto-foundation model} that bridges the gap between general-purpose and domain-specific models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Practical approach to asynchronous multivariate time series anomaly detection and localization. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 2485–2494, 2021.
  2. Gluonts: Probabilistic and neural time series modeling in python. Journal of Machine Learning Research, 21(116):1–6, 2020. URL http://jmlr.org/papers/v21/19-820.html.
  3. Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815, 2024.
  4. A. Asuncion and D. Newman. Uci machine learning repository, 2007.
  5. Accurate medium-range global weather forecasting with 3d neural networks. Nature, 619(7970):533–538, 2023.
  6. Spectral temporal graph neural network for multivariate time-series forecasting. Advances in neural information processing systems, 33:17766–17778, 2020.
  7. A synthetic limit order book dataset for benchmarking forecasting algorithms under distributional shift. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022.
  8. Estimating treatment effects from irregular time series observations with hidden confounders. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6897–6905, 2023a.
  9. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948, 2023b.
  10. Large scale financial time series forecasting with multi-faceted model. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 472–480, 2023c.
  11. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
  12. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88, 2022.
  13. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023.
  14. Timevae: A variational auto-encoder for multivariate time series generation. arXiv preprint arXiv:2111.08095, 2021.
  15. Simmtm: A simple pre-training framework for masked time-series modeling. Advances in Neural Information Processing Systems, 36, 2024.
  16. Addressing distribution shift in time series forecasting with instance normalization flows. arXiv e-prints, pages arXiv–2401, 2024.
  17. A. Garza and M. Mergenthaler-Canseco. Timegpt-1. arXiv preprint arXiv:2310.03589, 2023.
  18. Monash time series forecasting archive. In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  19. Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems, 36, 2024.
  20. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  21. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 387–395, 2018.
  22. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. In The 14th Symposium on Educational Advances in Artificial Intelligence (EAAI-24), 2024.
  23. Empowering time series analysis with large language models: A survey. arXiv preprint arXiv:2402.03182, 2024.
  24. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
  25. Position paper: What can large language models tell us about time series analysis. arXiv preprint arXiv:2402.02713, 2024.
  26. How does it function? characterizing long-term trends in production serverless workloads. In Proceedings of the 2023 ACM Symposium on Cloud Computing, pages 443–458, 2023.
  27. Polsird: modeling epidemic spread under intervention policies: analyzing the first wave of covid-19 in the usa. Journal of Healthcare Informatics Research, 5(3):231–248, 2021.
  28. Ai in healthcare: time-series forecasting using statistical, neural, and ensemble architectures. Frontiers in big data, 3:4, 2020.
  29. Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems, 33:6696–6707, 2020.
  30. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=q6X038vKgU.
  31. M. Krenn and L. Buffoni. Predicting the future of ai with ai: High-quality link prediction in an exponentially growing knowledge network. Nature machine intelligence, 2023.
  32. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 95–104, 2018.
  33. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022a.
  34. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In International Conference on Learning Representations (ICLR ’18), 2018.
  35. Generative time series forecasting with diffusion, denoise, and disentanglement. Advances in Neural Information Processing Systems, 35:23009–23022, 2022b.
  36. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=c8P9NQVtmnO.
  37. Foundation models for time series analysis: A tutorial and survey. arXiv preprint arXiv:2403.14735, 2024.
  38. Timer: Transformers for time series analysis at scale. arXiv preprint arXiv:2402.02368, 2024.
  39. VDT: General-purpose video diffusion transformers via mask modeling. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Un0rgm9f04.
  40. Swat: A water treatment testbed for research and training on ics security. In 2016 international workshop on cyber-physical systems for smart water networks (CySWater), pages 31–36. IEEE, 2016.
  41. Physics-informed long-sequence forecasting from multi-resolution spatiotemporal data. In IJCAI, pages 2189–2195, 2022.
  42. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
  43. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations (ICLR ’23), 2023.
  44. Mu2rest: Multi-resolution recursive spatio-temporal transformer for long-term prediction. In Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16–19, 2022, Proceedings, Part I, page 68–80, Berlin, Heidelberg, 2022. Springer-Verlag. ISBN 978-3-031-05932-2. doi: 10.1007/978-3-031-05933-9_6. URL https://doi.org/10.1007/978-3-031-05933-9_6.
  45. Time-delayed multivariate time series predictions. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pages 325–333. SIAM, 2023.
  46. OpenAI. Gpt-4 technical report, 2023.
  47. S2superscriptS2\textbf{S}^{2}S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ip-llm: Semantic space informed prompt learning with llm for time series forecasting. In Forty-first International Conference on Machine Learning, 2024.
  48. W. Peebles and S. Xie. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
  49. Improving language understanding by generative pre-training. 2018.
  50. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  51. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pages 8857–8868. PMLR, 2021.
  52. Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
  53. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3009–3017, 2019.
  54. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  55. High-dimensional multivariate forecasting with low-rank gaussian copula processes. Advances in neural information processing systems, 32, 2019.
  56. Identifying coordinated accounts on social media through hidden influence and group behaviours. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 1441–1451, 2021.
  57. Denoising diffusion implicit models. In International Conference on Learning Representations.
  58. Free energy computations: A mathematical perspective. World Scientific, 2010.
  59. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2828–2837, 2019.
  60. Bitsfusion: 1.99 bits weight quantization of diffusion model. arXiv preprint arXiv:2406.04333, 2024a.
  61. Disdet: Exploring detectability of backdoor attack on diffusion models. arXiv preprint arXiv:2402.02739, 2024b.
  62. Test: Text prototype aligned embedding to activate llm’s ability for time series. In The Twelfth International Conference on Learning Representations, 2023.
  63. Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611, 2022.
  64. Totem: Tokenized time series embeddings for general time series analysis, 2024.
  65. Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
  66. N. Tlc. Nyc taxi and limousine commission (tlc) trip record data. URL http://www. nyc. gov/html/tlc/html/about/trip record data. shtml, 2017.
  67. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023. URL https://api.semanticscholar.org/CorpusID:257219404.
  68. Tranad: deep transformer networks for anomaly detection in multivariate time series data. Proceedings of the VLDB Endowment, 15(6):1201–1214, 2022.
  69. Towards physics-informed deep learning for turbulent flow prediction. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1457–1466, 2020.
  70. Koopman neural operator forecaster for time-series with temporal distributional shifts. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=kUmdmHxK5N.
  71. Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592, 2024a.
  72. Unified training of universal time series forecasting transformers. In Forty-first International Conference on Machine Learning, 2024b.
  73. Coupled multiwavelet neural operator learning for coupled partial differential equations. ICLR 2023, 2023.
  74. Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In Proceedings of the 2018 world wide web conference, pages 187–196, 2018.
  75. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
  76. A survey on diffusion models for time series and spatio-temporal data. arXiv preprint arXiv:2404.18886, 2024.
  77. W. Ye and S. Gao. Spatiotemporal heterogeneities of the associations between human mobility and close contacts with covid-19 infections in the united states. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1–2, 2022.
  78. Time-series generative adversarial networks. Advances in neural information processing systems, 32, 2019.
  79. X. Yuan and Y. Qiao. Diffusion-ts: Interpretable diffusion for general time series generation. arXiv preprint arXiv:2403.01742, 2024.
  80. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8980–8987, 2022.
  81. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  82. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 2114–2124, 2021.
  83. Self-supervised learning for time series analysis: Taxonomy, progress, and prospects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024a.
  84. Vigdet: Knowledge informed neural temporal point process for coordination detection on social media. Advances in Neural Information Processing Systems, 34:3218–3231, 2021.
  85. Counterfactual neural temporal point process for estimating causal influence of misinformation on social media. Advances in Neural Information Processing Systems, 35:10643–10655, 2022.
  86. Multi-scale transformer pyramid networks for multivariate time series forecasting. IEEE Access, 2024b.
  87. Multivariate time-series anomaly detection via graph attention network. In 2020 IEEE international conference on data mining (ICDM), pages 841–850. IEEE, 2020.
  88. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI, 2021.
  89. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems, 36:43322–43355, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com