Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UniTS: A Unified Multi-Task Time Series Model (2403.00131v3)

Published 29 Feb 2024 in cs.LG and cs.AI

Abstract: Although pre-trained transformers and reprogrammed text-based LLMs have shown strong performance on time series tasks, the best-performing architectures vary widely across tasks, with most models narrowly focused on specific areas, such as time series forecasting. Unifying predictive and generative time series tasks within a single model remains challenging. We introduce UniTS, a unified multi-task time series model that utilizes task tokenization to integrate predictive and generative tasks into a single framework. UniTS employs a modified transformer block to capture universal time series representations, enabling transferability from a heterogeneous, multi-domain pre-training dataset-characterized by diverse dynamic patterns, sampling rates, and temporal scales-to a wide range of downstream datasets with varied task specifications and data domains. Tested on 38 datasets across human activity sensors, healthcare, engineering, and finance, UniTS achieves superior performance compared to 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including adapted text-based LLMs. UniTS also demonstrates strong few-shot and prompt capabilities when applied to new domains and tasks. In single-task settings, UniTS outperforms competitive task-specialized time series models. Code and datasets are available at https://github.com/mims-harvard/UniTS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. Ask me anything: A simple strategy for prompting language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=bhUPJnS2g0X.
  2. Tactis-2: Better, faster, simpler attentional copulas for multivariate time series. In International conference on learning representations, 2024.
  3. The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075, 2018.
  4. Spoken Arabic Digit. UCI Machine Learning Repository, 2010. DOI: https://doi.org/10.24432/C52C9Q.
  5. A spelling device for the paralysed. Nature, 398(6725):297–298, 1999.
  6. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. TEMPO: Prompt-based generative pre-trained transformer for time series forecasting. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=YH5w12OUuU.
  9. CDC. Illness. URL https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html.
  10. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
  11. PLOT: Prompt learning with optimal transport for vision-language models. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=zqwryBoXYnh.
  12. Adversarial autoencoder for unsupervised time series anomaly detection and interpretation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pp.  267–275, 2023b.
  13. Provably convergent schr\\\backslash\” odinger bridge with applications to probabilistic time series imputation. In International Conference on Machine Learning, 2023c.
  14. Contiformer: Continuous-time transformer for irregular time series modeling. In Thirty-seventh Conference on Neural Information Processing Systems, 2023d. URL https://openreview.net/forum?id=YJDz4F2AZu.
  15. A brain-computer interface for controlling iot devices using eeg signals. In 2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM), pp.  1–6. IEEE, 2021.
  16. Cuturi, M. Fast global alignment kernels. In Proceedings of the 28th international conference on machine learning (ICML-11), pp.  929–936, 2011.
  17. The ucr time series classification archive, October 2018. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
  18. Mst-gat: A multimodal spatial–temporal graph attention network for time series anomaly detection. Information Fusion, 89:527–536, 2023.
  19. Simmtm: A simple pre-training framework for masked time-series modeling. arXiv preprint arXiv:2302.00861, 2023.
  20. Emopain challenge 2020: Multimodal pain evaluation from facial and bodily expressions. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp.  849–856. IEEE, 2020.
  21. T-rep: Representation learning for time series using time-embeddings. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=3y2TfP966N.
  22. Res2net: A new multi-scale backbone architecture. IEEE transactions on pattern analysis and machine intelligence, 43(2):652–662, 2019.
  23. Editanything: Empowering unparalleled flexibility in image editing and generation. In Proceedings of the 31st ACM International Conference on Multimedia, pp.  9414–9416, 2023.
  24. Monash time series forecasting archive. In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  25. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000a.
  26. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, 2000b. Circulation Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218; doi: 10.1161/01.CIR.101.23.e215.
  27. Large language models are zero-shot time series forecasters. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  28. Domain adaptation for time series under feature and label shifts. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  29. Masked autoencoders are scalable vision learners. arXiv:2111.06377, 2021.
  30. A parametric empirical bayesian framework for the eeg/meg inverse problem: generative models for multi-subject and multi-modal integration. Frontiers in human neuroscience, 5:76, 2011.
  31. PRODIGY: Enabling in-context learning over graphs. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=pLwYhNNnoR.
  32. Hyndman, R. expsmooth: Data sets from “forecasting with exponential smoothing”. R package version, 2, 2015.
  33. Forecasting: principles and practice. OTexts, 2018.
  34. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
  35. Climateset: A large-scale climate model dataset for machine learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=3z9YV29Ogn.
  36. Probabilistic imputation for time-series classification with missing data. In International Conference on Machine Learning, pp.  16654–16667. PMLR, 2023.
  37. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  38. Multidimensional curve classification using passing-through regions. Pattern Recognition Letters, 20(11-13):1103–1111, 1999.
  39. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pp.  95–104, 2018.
  40. Learning to embed time series patches independently. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=WS7GuBDFa2.
  41. The power of scale for parameter-efficient prompt tuning. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  3045–3059, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243.
  42. Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges. Information Fusion, 91:93–102, 2023.
  43. Prefix-tuning: Optimizing continuous prompts for generation. In Zong, C., Xia, F., Li, W., and Navigli, R. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  4582–4597, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.353. URL https://aclanthology.org/2021.acl-long.353.
  44. Classification of household devices by electricity usage profiles. In Intelligent Data Engineering and Automated Learning-IDEAL 2011: 12th International Conference, Norwich, UK, September 7-9, 2011. Proceedings 12, pp.  403–412. Springer, 2011.
  45. An open access database for the evaluation of heart sound algorithms. Physiological measurement, 37(12):2181, 2016.
  46. Visual instruction tuning. In Advances in neural information processing systems, 2023a.
  47. uwave: Accelerometer-based personalized gesture recognition and its applications. Pervasive and Mobile Computing, 5(6):657–675, 2009.
  48. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International conference on learning representations, 2021.
  49. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.  61–68, 2022a.
  50. Koopa: Learning non-stationary time series dynamics with koopman predictors. In Advances in neural information processing systems, 2023b.
  51. itransformer: Inverted transformers are effective for time series forecasting. In International Conference on Learning Representations, 2024.
  52. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11976–11986, 2022b.
  53. Scale-teaching: Robust multi-scale training for time series classification with noisy labels. In Thirty-seventh Conference on Neural Information Processing Systems, 2023c. URL https://openreview.net/forum?id=9D0fELXbrg.
  54. Out-of-distribution representation learning for time series classification. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=gUZWOE42l6Q.
  55. Time series contrastive learning with information-aware augmentations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  4534–4542, 2023.
  56. Optimal deseasonalization for monthly and daily geophysical time series. Journal of Environmental Statistics, 2012.
  57. Web traffic time series forecasting, 2017. URL https://kaggle.com/competitions/web-traffic-time-series-forecasting.
  58. Mobile sensor data anonymization. In Proceedings of the international conference on internet of things design and implementation, pp.  49–58, 2019.
  59. Bake off redux: a review and experimental evaluation of recent time series classification algorithms. arXiv preprint arXiv:2304.13029, 2023.
  60. Generative modeling of regular and irregular time series data via koopman vaes. International conference on learning representations, 2024.
  61. Large Language Models Are Zero Shot Time Series Forecasters. In Advances in Neural Information Processing Systems, 2023.
  62. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023.
  63. NREL. Solar power data for integration studies. URL https://www.nrel.gov/grid/solar-power-data.html.
  64. Olszewski, R. T. Generalized feature extraction for structural pattern recognition in time-series data. Carnegie Mellon University, 2001.
  65. PeMS. Traffic. URL http://pems.dot.ca.gov/.
  66. A review of generalized zero-shot learning methods. IEEE transactions on pattern analysis and machine intelligence, 2022.
  67. Encoding time-series explanations through self-supervised model behavior consistency. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=yEfmhgwslQ.
  68. Language models are unsupervised multitask learners. 2019.
  69. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  70. Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
  71. Finding anomalous periodic time series: An application to catalogs of periodic variable stars. Machine learning, 74:281–313, 2009.
  72. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  73. Roverso, D. Plant diagnostics by transient classification: The aladdin approach. International Journal of Intelligent Systems, 17(8):767–790, 2002.
  74. Generalizing dtw to the multi-dimensional case requires an adaptive approach. Data mining and knowledge discovery, 31:1–31, 2017.
  75. Noninvasive fetal ecg: the physionet/computing in cardiology challenge 2013. In Computing in cardiology 2013, pp.  149–152. IEEE, 2013.
  76. A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert systems with applications, 39(8):7067–7083, 2012.
  77. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  78. Trindade, A. ElectricityLoadDiagrams20112014. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86.
  79. Universal time-series representation learning: A survey. arXiv preprint arXiv:2401.03717, 2024.
  80. Generalized models for the classification of abnormal movements in daily life and its applicability to epilepsy convulsion recognition. International journal of neural systems, 26(06):1650037, 2016.
  81. Micn: Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2022.
  82. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
  83. Contrast everything: A hierarchical contrastive framework for medical time-series. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=sOQBHlCmzp.
  84. Wetterstation. Weather. URL https://www.bgc-jena.mpg.de/wetter/.
  85. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
  86. Timesnet: Temporal 2d-variation modeling for general time series analysis. In International Conference on Learning Representations, 2023.
  87. Dynamic sparse network for time series classification: Learning what to “see”. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=ZxOO5jfqSYw.
  88. Retrieval-based reconstruction for time-series contrastive learning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=3zQo5oUvia.
  89. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.  11121–11128, 2023.
  90. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp.  2114–2124, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. doi: 10.1145/3447548.3467401.
  91. GLIPv2: Unifying localization and vision-language understanding. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022a. URL https://openreview.net/forum?id=wiBEFdAvl8L.
  92. Graph-guided network for irregularly sampled multivariate time series. In International Conference on Learning Representations, ICLR, 2022b.
  93. Self-supervised contrastive pre-training for time series via time-frequency consistency. Advances in Neural Information Processing Systems, 2022c.
  94. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12):5586–5609, 2021.
  95. Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802, 2023.
  96. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  11106–11115, 2021.
  97. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pp.  27268–27286. PMLR, 2022.
  98. One fits all: Power general time series analysis by pretrained LM. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=gMS6FVZvmF.
  99. Uni-Perceiver-MoE: Learning sparse generalist models with conditional moes. Advances in Neural Information Processing Systems, 35:2664–2678, 2022.
Citations (4)

Summary

  • The paper presents a unified task specification that eliminates task-specific modules and achieves a 10.5% gain in zero-shot forecasting accuracy.
  • It employs sequence and variable attention with a dynamic linear operator to efficiently handle diverse time series data shapes.
  • It outperforms specialized models in 27 out of 38 tasks, improving few-shot imputation MSE by 12.4% over the best baseline.

UniTS: Building a Unified Time Series Model

The paper entitled "UniTS: Building a Unified Time Series Model" presents a sophisticated approach towards developing a versatile time series model capable of handling multiple diverse tasks without the reliance on task-specific modules. This research addresses a notable gap in the domain, where general-purpose models for time series analysis have not been extensively explored.

Overview

UniTS is designed as a foundational time series model analogous to those used in language and vision domains. The model integrates a universal task specification which supports tasks such as classification, forecasting, imputation, and anomaly detection. It achieves this through a unified network backbone, utilizing sequence and variable attention, accompanied by a dynamic linear operator. This design supports universal adaptability and outperforms task-specific models on 38 multi-domain datasets.

Key Contributions

  1. Universal Task Specification: The model leverages a novel prompting framework, converting diverse tasks into a unified token representation. This allows for efficient task specification and adaptability across various tasks without the need for specialized modules.
  2. Data-Domain Agnostic Network: Using self-attention mechanisms across both sequence and variable dimensions, UniTS accommodates varying data shapes typical of time series data. The introduction of a dynamic linear operator further enhances its ability to model dense relations within sequences of any length.
  3. Unified Model with Shared Weights: UniTS maintains shared weights across all tasks, enhancing generalization. A unified masked reconstruction pretraining scheme bolsters the model's capability to manage generative and recognition tasks concurrently.

Numerical Results

The UniTS model demonstrates superior performance over existing models across multiple datasets. It shows notable prowess in zero-shot and prompt-based learning scenarios. For instance, UniTS achieved 10.5% improvement in one-step zero-shot forecasting over the leading baseline model, highlighting its effective handling of new forecasting horizons and variable numbers.

In multi-task settings, the UniTS model outperformed task-specific baselines in 27 out of 38 tasks, showcasing its adaptability and generalization capabilities. For few-shot transfer learning, UniTS improved MSE by 12.4% on imputation tasks compared to the best baseline.

Implications and Future Directions

The development of UniTS signifies an important stride towards generalist models in time series analysis, supporting diverse data domains and task specifications with a single, unified architecture. The implications for practical applications are vast, including improved scalability and efficiency in deploying time series solutions across industries such as healthcare, finance, and engineering.

Future research could explore scaling the model further or exploring its efficacy in additional domains. Moreover, investigating how UniTS can integrate with emerging data types or combine with reinforcement learning for real-time applications could prove beneficial.

Conclusion

The paper's introduction of UniTS marks a significant advancement in the pursuit of unified time series models. By eliminating reliance on task-specific modules, UniTS propels the potential for universal, adaptable, and efficient time series analysis, paving the way for innovative applications and methodologies in the field of artificial intelligence.

Github Logo Streamline Icon: https://streamlinehq.com