Stacked Bidirectional and Unidirectional LSTM Recurrent Neural Network for Forecasting Network-wide Traffic State with Missing Values (2005.11627v1)

Published 24 May 2020 in cs.LG, eess.SP, and stat.ML

Abstract: Short-term traffic forecasting based on deep learning methods, especially recurrent neural networks (RNN), has received much attention in recent years. However, the potential of RNN-based models in traffic forecasting has not yet been fully exploited in terms of the predictive power of spatial-temporal data and the capability of handling missing data. In this paper, we focus on RNN-based models and attempt to reformulate the way to incorporate RNN and its variants into traffic prediction models. A stacked bidirectional and unidirectional LSTM network architecture (SBU-LSTM) is proposed to assist the design of neural network structures for traffic state forecasting. As a key component of the architecture, the bidirectional LSTM (BDLSM) is exploited to capture the forward and backward temporal dependencies in spatiotemporal data. To deal with missing values in spatial-temporal data, we also propose a data imputation mechanism in the LSTM structure (LSTM-I) by designing an imputation unit to infer missing values and assist traffic prediction. The bidirectional version of LSTM-I is incorporated in the SBU-LSTM architecture. Two real-world network-wide traffic state datasets are used to conduct experiments and published to facilitate further traffic prediction research. The prediction performance of multiple types of multi-layer LSTM or BDLSTM models is evaluated. Experimental results indicate that the proposed SBU-LSTM architecture, especially the two-layer BDLSTM network, can achieve superior performance for the network-wide traffic prediction in both accuracy and robustness. Further, comprehensive comparison results show that the proposed data imputation mechanism in the RNN-based models can achieve outstanding prediction performance when the model's input data contains different patterns of missing values.

Authors (4)

Zhiyong Cui (34 papers)
Ruimin Ke (16 papers)
Ziyuan Pu (27 papers)
Yinhai Wang (45 papers)

Citations (298)

View on Semantic Scholar

Summary

Overview of Advanced LSTM Architectures for Traffic Forecasting

The examined paper addresses critical challenges in short-term traffic forecasting by enhancing the predictive capability of Recurrent Neural Networks (RNNs), specifically through the use of advanced Long Short-Term Memory (LSTM) architectures. The paper, authored by Zhiyong Cui, Ruimin Ke, Ziyuan Pu, and Yinhai Wang, introduces a novel Stacked Bidirectional and Unidirectional LSTM (SBU-LSTM) architecture for forecasting network-wide traffic states, overcoming limitations associated with missing data and learning from temporal dependencies.

Stacked Bidirectional and Unidirectional LSTM Network

The research posits an SBU-LSTM framework, where the architecture's core lies in its ability to capture both forward and backward temporal dependencies through a Bidirectional LSTM (BDLSTM). This approach is underpinned by the hypothesis that incorporating backward dependencies can significantly improve the model's ability to interpret spatial-temporal relationships inherent in traffic data, which are often periodic and influenced by both upstream and downstream traffic conditions.

The paper suggests that, while RNNs have been modified and used efficiently for traffic forecasting, the integration of both bidirectional and unidirectional LSTMs in a stacked configuration represents a meaningful step forward. Their proposed architecture is shown to supersede traditional mono-directional LSTMs in managing complex spatial-temporal data and demonstrating robustness against missing values.

Imputation Mechanism within LSTM Framework

A significant contribution of this work is the novel imputation mechanism integrated into the LSTM structure (LSTM-I). The method employs a specialized imputation unit designed to infer missing values based on the learned dependencies. The bidirectional variant of this approach (BDLSTM-I) is also explored, asserting that missing value imputation can be effectively achieved within the predicting framework itself rather than as a separate pre-processing step. This dual-purpose model is particularly advantageous for real-time applications where comprehensive dataset availability cannot be assumed.

Experimental Evaluation

The paper substantiates its claims with empirical evidence drawn from two large-scale traffic datasets: the LOOP-SEA dataset and the PEMS-BAY dataset. The results indicate a notable improvement in prediction accuracy and robust performance under varying data loss scenarios. Specifically, the SBU-LSTM architecture demonstrates reduced Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) across different configurations and levels of missing data, underscoring its superior performance relative to existing models like GRU-D. Furthermore, scenarios with high missing rates reveal the architecture's efficacy in maintaining prediction accuracy, an essential metric for real-world applicability.

Implications and Future Directions

The paper's implications extend beyond immediate traffic state predictions. By addressing both prediction and imputation within a unified framework, the proposed models suggest pathways for improving other applications that rely on incomplete temporal datasets. The paper also opens avenues for enhancing model structures in RNN-based predictive tasks involving complex temporal and spatial interdependencies.

Notably, the capacity-performance trade-offs highlighted in the evaluation provide valuable insights into the potential customization of model structures for specific applications or datasets. Future research might explore hybrid models that incorporate LSTM variants with other deep learning paradigms like Graph Neural Networks (GNNs) to further exploit spatial data continuity and network topology in traffic forecasting.

In conclusion, the paper presents a substantive enhancement to traffic prediction algorithms, advocating for a refined approach to spatial-temporal data interpretation and missing value handling. This work is a compelling contribution to the field of intelligent transportation systems, encouraging continued exploration into deep learning model optimizations and their practical applications in transport analytics.

PDF Markdown