Timer: Generative Pre-trained Transformers Are Large Time Series Models (2402.02368v3)

Published 4 Feb 2024 in cs.LG and stat.ML

Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of LLMs, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.

PDF HTML Abstract

Overview of "Timer: Transformers for Time Series Analysis at Scale"

The paper "Timer: Transformers for Time Series Analysis at Scale" introduces a novel approach for enhancing the performance of time series models through the development of large time series models (LTSMs). In leveraging the architectural frameworks of LLMs, which have exhibited unprecedented generalization and scalability across various tasks, the authors aim to address the current limitations in time series analysis, particularly in data-scarce environments.

Key Contributions

The central contribution of this research is the introduction of Timer, a Time Series Transformer developed using a GPT-style architecture, optimized through pre-training on extensive, multi-domain datasets comprising up to 1 billion time points. The proposed model adopts several innovative methodologies:

Unified Data Representation: The authors propose a unified single-series sequence (S3) format that homogenizes heterogeneous time series data into consistent token sequences. This representation supports the amalgamation of diverse time series types, facilitating large-scale pre-training.
Generative Task Framework: They convert typical time series analysis tasks such as forecasting, imputation, and anomaly detection into a unified generative task. This conversion leverages a decoder-only Transformer architecture, employing an autoregressive next token prediction objective.
Scalability and Generality: The Timer model is pre-trained across extensive datasets divided into hierarchical capacities, allowing for detailed investigations into model scalability. The model effectively demonstrates notable capability in few-shot scenarios, exhibiting improvements over models trained from scratch with significantly less training data.

Experimental Results

The experimental evaluation underscores Timer's competitive performance across several tasks:

Time Series Forecasting: Timer achieves superior results, particularly in data-limited scenarios, outperforming existing state-of-the-art methods (iTransformer and PatchTST) on several datasets by requiring as little as 1% of the training data used by other models to reach comparable accuracy levels.
Imputation and Anomaly Detection: Timer's capabilities extend beyond basic forecasting. The model shows a substantial decrease in imputation errors, with noted improvements in segment-level imputation challenges. Moreover, when applied to anomaly detection in the UCR Anomaly Archive, Timer successfully identifies anomalies at higher precision compared to baseline methods.

Architectural Insights

The paper also examines the foundational architecture choices for large scale time series models. It highlights the superior performance and generalization capacity of the decoder-only Transformer, akin to architectures used in LLMs, over encoder-only models conventional to time series forecasting. This is attributed to the autoregressive training objective, which aligns with the natural sequential dependencies present in time series data.

Implications and Future Directions

The implications of this paper are substantial for developing adaptable and efficient models within the time series domain. By emulating the training paradigms successful in LLMs, Timer can potentially serve a broad range of applications, from weather prediction to industrial process monitoring. The research prompts a reevaluation of existing practices in time series model development, particularly in the context of scalability and transferability.

Future research directions include exploring zero-shot generalization capabilities and the advancement of domain-specific pre-trained models, further enhancing the model's adaptability and reducing the dependence on large annotated datasets. Additionally, there lies a direction to investigate the synergy between increased model capacity and dataset size, elaborating on the scaling laws applicable to time series models as seen in LLM development.

In summary, "Timer: Transformers for Time Series Analysis at Scale" advances the discourse in time series analysis by presenting a scalable, generative model that aligns with the autoregressive strengths demonstrated in LLMs, paving the way for more robust and adaptable analytical tools in the field.