Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case (2001.08317v1)

Published 23 Jan 2020 in cs.LG and stat.ML

Abstract: In this paper, we present a new approach to time series forecasting. Time series data are prevalent in many scientific and engineering disciplines. Time series forecasting is a crucial task in modeling time series data, and is an important area of machine learning. In this work we developed a novel method that employs Transformer-based machine learning models to forecast time series data. This approach works by leveraging self-attention mechanisms to learn complex patterns and dynamics from time series data. Moreover, it is a generic framework and can be applied to univariate and multivariate time series data, as well as time series embeddings. Using influenza-like illness (ILI) forecasting as a case study, we show that the forecasting results produced by our approach are favorably comparable to the state-of-the-art.

Authors (4)

Neo Wu (5 papers)
Bradley Green (20 papers)
Xue Ben (3 papers)
Shawn O'Banion (8 papers)

Citations (397)

View on Semantic Scholar

Summary

Deep Transformer Models for Time Series Forecasting: A Study on Influenza Prevalence

This paper investigates the application of Transformer-based models for time series forecasting, with a focus on predicting influenza-like illness (ILI) prevalence. The authors present a novel method leveraging the self-attention mechanism inherent to Transformer architectures, positing its potential advantages in capturing complex temporal dependencies over traditional sequence models like RNNs or LSTMs.

Overview of Methodology

The proposed model operates within a generic framework, accommodating both univariate and multivariate time series forecasts. The major innovation stems from its use of the Transformer model's self-attention capabilities, which can process entire sequences simultaneously rather than in an ordered manner. This characteristic is especially advantageous for modeling long-term dependencies without suffering from the limitations inherent in sequence-aligned models, such as the vanishing gradient issues often encountered in RNNs.

In the case paper on ILI, the model incorporates training data derived from historical ILI reports published by the CDC. These reports document weekly ILI ratios based on symptomatic patient data across various states in the U.S. The time series forecasting task involves predicting the ILI ratio weeks into the future, comparing outputs against those of models like ARIMA, LSTM, and Seq2Seq with attention mechanisms.

Results and Discussion

The Transformer-based model demonstrated favorable performance against existing methods. Specifically, the Pearson correlation coefficient and root-mean-square error (RMSE) metrics highlighted its robust capabilities, achieving the best performance with mean correlation improvements over ARIMA and RMSE reductions compared to both LSTM and Seq2Seq models. Importantly, the model's effectiveness was noted at a national level, suggesting that the model could successfully generalize state-level data for broader geographical forecasting.

Further experimentation involved enhancing the model with multivariate time series data, which included additional features like week numbers. This slight improvement indicates a potential for multivariate application, although self-attention already captures such dependencies in single-variable contexts.

Strategic Insights and Future Directions

The implications of this work are notable both practically and theoretically. Practically, the model's effectiveness in ILI forecasting could be extended to other epidemiological forecasts, offering a tool for real-time disease monitoring and strategic resource allocation. Theoretically, the use of self-attention to model not only observed sequences but also underlying dynamical systems offers a notable segue into understanding complex systems through proxy metrics like time delay embedding.

Moreover, there’s an opportunity to explore application in spatio-temporal data. Given the potential for extending self-attention mechanisms to consider spatial dependencies, future work could involve adaptation to data types where geographic information plays a critical role (e.g., climate modeling).

In conclusion, this research provides a compelling case for adopting Transformer-based models in fields traditionally dominated by time-series specific architectures. By exploiting the advantages of self-attention, this approach establishes a projection of future AI developments in epidemiology, dynamical systems modeling, and beyond.

PDF Markdown