Unified Training of Universal Time Series Forecasting Transformers (2402.02592v2)

Published 4 Feb 2024 in cs.LG and cs.AI

Abstract: Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models. The concept of universal forecasting, emerging from pre-training on a vast collection of time series datasets, envisions a single Large Time Series Model capable of addressing diverse downstream forecasting tasks. However, constructing such a model poses unique challenges specific to time series data: i) cross-frequency learning, ii) accommodating an arbitrary number of variates for multivariate time series, and iii) addressing the varying distributional properties inherent in large-scale data. To address these challenges, we present novel enhancements to the conventional time series Transformer architecture, resulting in our proposed Masked Encoder-based Universal Time Series Forecasting Transformer (Moirai). Trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains, Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models. Code, data, and model weights can be found at https://github.com/SalesforceAIResearch/uni2ts.

Authors (6)

Gerald Woo (11 papers)
Chenghao Liu (61 papers)
Akshat Kumar (29 papers)
Caiming Xiong (337 papers)
Silvio Savarese (200 papers)
Doyen Sahoo (47 papers)

Citations (86)

View on Semantic Scholar

Summary

Insightful Overview of "Unified Training of Universal Time Series Forecasting Transformers"

The paper "Unified Training of Universal Time Series Forecasting Transformers" presents a novel approach to address the challenges inherent in constructing a Large Time Series Model (LTM), specifically designed for universal time series forecasting. The authors propose Moirai, a Masked Encoder-based Transformer architecture, to adeptly handle time series data's unique characteristics, which include cross-frequency learning, managing multivariate time series with varying numbers of variates, and the task of capturing diverse distributional properties.

Contributions and Proposed Solutions

Novel Transformer Architecture: The authors introduce several architectural enhancements to the conventional Transformer to accommodate the unique requirements of time series data for universal forecasting. These enhancements include multi patch size projection layers for frequency-specific learning, Any-variate Attention for handling an arbitrary number of variates, and a flexible mixture distribution to model diverse predictive distributions.
Large-Scale Pre-training Dataset: A significant contribution of this work is the introduction of the Large-scale Open Time Series Archive (LOTSA). This dataset encompasses over 27 billion observations across diverse domains, such as energy, transport, climate, and economics, providing a robust foundation for training LTMs.
Unified Pre-training and Flexible Task Handling: Moirai is trained with a task distribution that randomly samples varied context and prediction lengths, enabling the model to generalize effectively to diverse forecasting scenarios. This training approach diverges from conventional methods that typically train models on a fixed dataset and specific settings, enhancing the model's adaptability.

Experimental Insights

The paper demonstrates Moirai's efficacy through extensive experiments including both in-distribution and out-of-distribution evaluations. In in-distribution tasks, Moirai outperforms several baselines on the Monash Time Series Forecasting Benchmark, achieving the best or second-best normalized MAE across a wide range of datasets.

For out-of-distribution tasks, Moirai exhibits strong zero-shot capabilities, often outperforming state-of-the-art methods trained on individual datasets. This is particularly evident in long-sequence forecasting tasks and probabilistic forecasting across different domains.

Theoretical and Practical Implications

The proposed solutions hold significant theoretical implications. By enhancing the architecture to manage heterogeneity in time series data and incorporating a flexible mixture distribution, Moirai sets a new standard for handling diverse datasets with a single model. Practically, this means reduced training times, lower computational costs, and broader applicability of forecasting models across various industries and domains.

Future Developments

While the paper provides compelling results, the authors acknowledge the need for further exploration in hyperparameter tuning, scaling behaviors, and extending architecture capabilities to support very high-dimensional datasets. Future work may also include integrating multimodal inputs, such as incorporating text or tabular data, increasing the model's utility in real-world applications.

In conclusion, this paper's introduction of Moirai and LOTSA represents a significant step towards the realization of universal forecasting models. These advancements provide a foundation for further research and development in creating versatile, general-purpose models that can handle the ever-increasing complexity and volume of time series data across myriad applications.