Understanding Different Design Choices in Training Large Time Series Models (2406.14045v1)

Published 20 Jun 2024 in cs.LG and cs.AI

Abstract: Inspired by LLMs, Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities, spanning pre-processing techniques, model configurations, and dataset configurations. In this work, we comprehensively analyze these design choices and aim to identify the best practices for training LTSM. Moreover, we propose \emph{time series prompt}, a novel statistical prompting strategy tailored to time series data. Furthermore, based on the observations in our analysis, we introduce \texttt{LTSM-bundle}, which bundles the best design choices we have identified. Empirical results demonstrate that \texttt{LTSM-bundle} achieves superior zero-shot and few-shot performances compared to state-of-the-art LSTMs and traditional TSF methods on benchmark datasets.

PDF HTML Abstract

Understanding Different Design Choices in Training Large Time Series Models

Introduction

Time series forecasting (TSF) remains a fundamental task in time series analysis, focusing on predicting future data points based on historical values. Over the years, TSF methodologies have evolved from traditional statistical techniques to machine learning and more recently to deep learning. The advent of transformers, particularly the ability of these architectures to excel in sequential modeling, has led to their application in TSF tasks, especially for long-term forecasting.

Drawing inspiration from the capabilities of LLMs, researchers are now exploring Large Time Series Models (LTSMs) utilizing transformer-based architectures for TSF. However, training LTSMs presents unique challenges due to the heterogeneity in time series data. These challenges include variations in data frequencies, dimensions, and patterns, which complicate the training of LTSMs to generalize across diverse datasets.

This paper provides a comprehensive analysis of various design choices in training LTSMs, spanning pre-processing techniques, model configurations, and dataset configurations. Additionally, the authors propose a novel statistical prompting strategy called the "Time Series Prompt" and introduce an optimal combination of design choices termed the "LTSM-bundle". The empirical results demonstrate the superior performance of LTSM-bundle in zero-shot and few-shot settings compared to state-of-the-art LTSMs.

Methodology

Pre-processing: Instruction Prompts

The pre-processing step aims to enable LTSMs to better adapt to time series datasets. Two types of prompts are studied:

Text Prompts: Task-specific information formatted into text.
Time Series Prompts: A novel approach introduced in this paper. These prompts are generated by extracting statistical features from the training dataset, providing a robust statistical description of each dataset.

Results

Empirical results indicate that time series prompts outperform text prompts, yielding up to 8% lower MAE scores. Additionally, the use of time series prompts results in up to 3% lower MSE scores when compared to scenarios without prompts.

Pre-processing: Tokenizations

This section evaluates linear tokenization and time series tokenization approaches:

Linear Tokenization: Involves using a trainable linear layer to convert time series numbers into tokens.
Time Series Tokenization: Converts continuous time series data into discrete tokens using a trainable function.

Results

Linear tokenization proved more effective than time series tokenization in training LTSMs, leading to higher performance across diverse datasets.

Model Configuration: Training Paradigm

Three distinct training paradigms are compared:

Fully Fine-tuning: Fine-tuning all parameters using pre-trained weights.
Training from Scratch: Initializing all model parameters from scratch.
LoRA Fine-tuning: Using low-rank adapters to fine-tune a limited number of parameters.

Results

Fully fine-tuning emerged as the most effective strategy, offering significantly lower MSE and MAE scores compared to training from scratch and LoRA fine-tuning.

Model Configuration: Base Model Selection

Four pre-trained models were evaluated as potential backbones for LTSMs:

GPT-2-Small
GPT-2-Medium
GPT-2-Large
Phi-2

Results

GPT-2-Medium and GPT-2-Small showed superior performance compared to GPT-2-Large, particularly in short-term and long-term forecasting scenarios respectively, suggesting these backbones are less prone to overfitting.

Dataset Configuration: Quantity and Diversity

The impact of data quantity and diversity on model performance was also examined:

Data Quantity: Various down-sampling rates (10%, 5%, 2.5%) were compared.
Diversity: Models were trained on an increasing number of datasets to evaluate performance improvements.

Results

Using 5% of the training data generally provided the best balance for model granularity and performance. Increasing dataset diversity consistently improved model generalizability.

Comparison with State-of-the-Art Methods

The LTSM-bundle demonstrated superior performance across various benchmarks in zero-shot and few-shot settings. Notably, LTSM-bundle outperformed numerous state-of-the-art models like PatchTST, DLinear, and others.

Conclusion and Future Directions

This paper provides an in-depth analysis of critical design choices in training LTSMs, yielding insights that culminate in the LTSM-bundle. This framework exhibits strong performance with enhanced generalizability and efficiency.

Future work might involve developing more nuanced prompting strategies and exploring synthetic datasets to further enhance LTSMs. Additionally, investigating variate-specific prompts and the integration of more complex statistical descriptions could yield further improvements.

Overall, this work lays substantial groundwork for advancing the field of time series forecasting using large-scale, transformer-based models.

PDF Markdown Bookmark Chat (Pro)

Authors (11)

Yu-Neng Chuang (28 papers)
Songchen Li (2 papers)
Jiayi Yuan (25 papers)
Guanchu Wang (33 papers)
Kwei-Herng Lai (24 papers)
Leisheng Yu (4 papers)
Sirui Ding (14 papers)
Chia-Yuan Chang (18 papers)
Qiaoyu Tan (36 papers)
Daochen Zha (56 papers)
Xia Hu (186 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/YuNengChuang/status/1806032177554919862