Papers
Topics
Authors
Recent
Search
2000 character limit reached

TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

Published 26 Feb 2024 in cs.LG | (2402.16412v2)

Abstract: This work studies the problem of time series analysis with generalist (or foundation) models, which are models trained across many data domains. Drawing inspiration from the widespread success of LLMs, we consider the simple strategy of discretely tokenizing time series data drawn from a myriad of datasets via self-supervision, then using the fixed tokenization to solve a variety of tasks across many data domains. Canonically, time series models are either trained on a single dataset or built in a task-specific manner (e.g., a forecasting-only model), where many use patches of time as inputs to the model. As such, performant generalist, discrete representation time series models explored across many tasks are of value. Our method, TOkenized Time Series EMbeddings (TOTEM), produces such generalist time series models with minimal or no fine-tuning while exhibiting strong zero-shot performance. We evaluate TOTEM extensively over nearly 500 experiments on three commonly-studied time series tasks with real-world data: imputation (17 baselines, 12 datasets), anomaly detection (19 baselines, 25 datasets), and forecasting (14 baselines, 12 datasets). We conclude that TOTEM matches or outperforms existing state-of-the-art models in both the canonical specialist setting (i.e., training one model on one domain) as well as the generalist setting (i.e., training a single model on many domains), which demonstrates the efficacy of tokenization for general time series analysis. The open-source implementation is available here: https://github.com/SaberaTalukder/TOTEM; a video summary is available here: https://www.youtube.com/watch?v=OqrCpdb6MJk.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Anderson, O. D. Time-series. 2nd edn., 1976.
  2. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
  3. On the benefits of early fusion in multimodal representation learning. arXiv preprint arXiv:2011.07191, 2020.
  4. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  6989–6997, 2023.
  5. Tsmixer: An all-mlp architecture for time series forecasting. arXiv preprint arXiv:2303.06053, 2023.
  6. Long-term forecasting with tide: Time-series dense encoder. arXiv preprint arXiv:2304.08424, 2023.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  8. Dewave: Discrete eeg waves encoding for brain dynamics to text translation. arXiv preprint arXiv:2309.14030, 2023.
  9. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12873–12883, 2021.
  10. Unsupervised scalable representation learning for multivariate time series. Advances in neural information processing systems, 32, 2019.
  11. Gage, P. A new algorithm for data compression. C Users Journal, 12(2):23–38, 1994.
  12. Monash time series forecasting archive. In Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  13. Temporal convolutional networks for anomaly detection in time series. In Journal of Physics: Conference Series, volume 1213, pp.  042050. IOP Publishing, 2019.
  14. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  15. Holt, C. C. Forecasting trends and seasonals by exponentially weighted moving averages. ONR Memorandum, 52(52):5–10, 1957.
  16. Forecasting: principles and practice. OTexts, 2018.
  17. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2021.
  18. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451, 2020.
  19. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
  20. Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv preprint arXiv:2305.10721, 2023.
  21. Scinet: Time series modeling and forecasting with sample convolution and interaction. Advances in Neural Information Processing Systems, 35:5816–5828, 2022a.
  22. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International conference on learning representations, 2021.
  23. Non-stationary transformers: Exploring the stationarity in time series forecasting. Advances in Neural Information Processing Systems, 35:9881–9893, 2022b.
  24. itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625, 2023.
  25. Multivariate time series imputation with generative adversarial networks. Advances in neural information processing systems, 31, 2018.
  26. E2gan: End-to-end generative adversarial network for multivariate time series imputation. In Proceedings of the 28th international joint conference on artificial intelligence, pp.  3094–3100. AAAI Press Palo Alto, CA, USA, 2019.
  27. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
  28. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  29. N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437, 2019.
  30. Ajile12: Long-term naturalistic human intracranial neural recordings and pose. Scientific data, 9(1):184, 2022.
  31. Improving language understanding by generative pre-training. 2018.
  32. Vq-tr: Vector quantized attention for time series forecasting. 2022a.
  33. Vq-ar: Vector quantized autoregressive probabilistic time series forecasting. arXiv preprint arXiv:2205.15894, 2022b.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  35. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  36. Timeseries anomaly detection using temporal hierarchical one-class network. Advances in Neural Information Processing Systems, 33:13016–13026, 2020.
  37. Deep neural imputation: A framework for recovering incomplete brain recordings. arXiv preprint arXiv:2206.08094, 2022.
  38. Forecasting at scale. The American Statistician, 72(1):37–45, 2018.
  39. Unsupervised representation learning for time series with temporal neighborhood coding. arXiv preprint arXiv:2106.00750, 2021.
  40. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  41. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  42. Winters, P. R. Forecasting sales by exponentially weighted moving averages. Management science, 6(3):324–342, 1960.
  43. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381, 2022.
  44. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
  45. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv preprint arXiv:2210.02186, 2022a.
  46. Flowformer: Linearizing transformers with conservation flows. arXiv preprint arXiv:2202.06258, 2022b.
  47. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
  48. Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion. In International Conference on Machine Learning, pp.  25038–25054. PMLR, 2022.
  49. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8980–8987, 2022.
  50. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.  11121–11128, 2023.
  51. Less is more: Fast multivariate time series forecasting with light sampling-oriented mlp structures. arXiv preprint arXiv:2207.01186, 2022.
  52. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2022.
  53. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  11106–11115, 2021.
  54. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pp.  27268–27286. PMLR, 2022.
  55. One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.
Citations (8)

Summary

  • The paper introduces a tokenized embedding approach using VQVAE and CNN to encode time series data for generalist modeling.
  • It demonstrates state-of-the-art performance in forecasting, anomaly detection, and imputation across 17 diverse datasets.
  • Its domain-agnostic design reduces the need for complex data engineering, enhancing cross-domain time series analysis.

Unifying Time Series Analysis Across Domains with TOTEM: A Tokenized Embedding Approach

Introduction to TOTEM

The field of time series analysis is experiencing a paradigm shift with the introduction of generalist models capable of learning across multiple domains without substantial retraining or task-specific tuning. The paper introduces TOTEM (TOkenized Time Series EMbeddings), an architecture designed to encode time series data into discrete token representations, facilitating this generalist approach. Leveraging a Vector Quantized Variational AutoEncoder (VQVAE), TOTEM demonstrates remarkable performance in a variety of tasks including forecasting, anomaly detection, and imputation, across a wide range of datasets.

TOTEM's Design and Implementation

TOTEM's architecture is rooted in simplicity yet designed to handle the complexity and diversity of time series data across multiple domains. It employs a 1D strided Convolutional Neural Network (CNN) for both the encoder and decoder, with a quantizer and latent codebook at its core. This design enables TOTEM to create non-overlapping, discrete tokens that capture the embedded information of time series data efficiently. A key feature of TOTEM is its lack of requirement for intricate data engineering, making it universally applicable to various time series data without the need for domain-specific adjustments.

Experimental Evaluation

TOTEM was rigorously evaluated against 17 real-world time series datasets across three essential tasks: imputation, anomaly detection, and forecasting. The results solidify TOTEM's stance as a versatile tool, matching or exceeding previous state-of-the-art methodologies in most instances. Notably, TOTEM shines in settings that demand generalist capabilities, effectively training on multiple domains and seamlessly transitioning to zero-shot testing scenarios, where models are evaluated on unseen data domains.

Impacts and Implications

From a theoretical standpoint, TOTEM contributes significantly to our understanding of data representation in time series analysis. Its token-based approach paves the way for more unified and efficient handling of time series data, potentially influencing the development of future models in natural language processing and beyond. Practically, TOTEM's generalist nature and minimal tuning requirement lower the barrier to entry for complex time series analysis across a myriad of applications – from finance and healthcare to climatology.

Speculations on Future Developments

The success of TOTEM invites speculation on several future advancements in time series analysis. One area of potential exploration is the dynamic adjustment of token lengths to optimize the representational efficiency for specific tasks or datasets. Additionally, investigating the scaling properties of generalist models as a function of data size and domain diversity could unearth deeper insights into the nature of time series data representation. TOTEM stands as both a milestone and a beacon, guiding the path towards more generalized and efficient models for time series analysis across the spectrum of domains and tasks.

Conclusion

TOTEM represents a significant step forward in the quest for a unified approach to time series analysis. By encapsulating time series data into discrete, tokenized embeddings, TOTEM effectively bridges the gap between domain-specific and generalist modeling techniques, showcasing impressive versatility and performance across tasks and data domains. As the field continues to evolve, TOTEM's foundational principles and architecture are expected to influence future directions, steering towards more integrated and coherent strategies for time series analysis.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 30 likes about this paper.