Recurrent Neural Networks for Multivariate Time Series with Missing Values (1606.01865v2)

Published 6 Jun 2016 in cs.LG, cs.NE, and stat.ML

Abstract: Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provides useful insights for better understanding and utilization of missing values in time series analysis.

Authors (5)

Zhengping Che (41 papers)
Sanjay Purushotham (23 papers)
Kyunghyun Cho (292 papers)
David Sontag (95 papers)
Yan Liu (420 papers)

Citations (1,771)

View on Semantic Scholar

Summary

Recurrent Neural Networks for Multivariate Time Series with Missing Values

Introduction

The paper Recurrent Neural Networks for Multivariate Time Series with Missing Values introduces a novel approach to handle missing values in multivariate time series data, particularly in critical domains such as healthcare. The research underscores that missing values frequently carry significant information—referred to as informative missingness—about the target labels in supervised learning tasks.

The authors propose an innovative deep learning model, GRU-D, which builds upon the Gated Recurrent Unit (GRU) architecture. GRU-D incorporates informative missingness through two mechanisms: masking vectors and time intervals. The paper demonstrates that GRU-D not only captures long-term temporal dependencies but also effectively utilizes missing patterns to enhance predictive performance.

Methodology

Model Architecture:

The authors introduce GRU-D, which stands on the foundation of the GRU model, a variant of Recurrent Neural Networks (RNN). The key innovation lies in the incorporation of two representations of missing patterns:

Masking Vectors: These indicate the presence or absence of each variable at each time step.
Time Intervals: This captures the duration since the last observation for each variable.

Trainable Decay Mechanism:

GRU-D employs trainable decay rates to manage missing values in two vital aspects:

Input Decay: Adjusts missing values towards the empirical mean over time, reflecting a realistic assumption in many domains where biological or natural processes stabilize over time.
Hidden State Decay: Alters the influence of previous hidden states based on the duration of missing variables, allowing the model to learn temporal patterns effectively.

Both decays are governed by the decay rate parameter influencing the transition dynamics within the GRU units.

Comparative Models:

The paper also evaluates several baseline models:

GRU-Mean: Replaces missing values with the mean of that variable.
GRU-Forward: Uses the most recent observation to fill in missing values.
GRU-Simple: Concatenates the input with masking and time intervals.

Experimental Results

The empirical evaluation spans one synthetic dataset and two real-world healthcare datasets (MIMIC-III and PhysioNet). The results affirm the superior performance of GRU-D across diverse scenarios:

Synthetic Data: Demonstrates the model’s capacity to exploit informative missingness with varying degrees of correlation between missing values and labels.
Healthcare Data: GRU-D achieves the highest Area Under Curve (AUC) scores in mortality prediction tasks. It also excels in multi-task classification, further validating its robustness.

Implications and Future Directions

The findings from this paper contribute significantly to both theoretical and practical aspects of time series analysis:

Theoretical Implications: The concept of trainable decay mechanisms introduces a novel way to handle missing data, advancing the current methodologies in time-series analysis.
Practical Applications: GRU-D is particularly valuable in healthcare, enabling early and accurate predictions which are critical for patient care.
Future Research: The foundation laid by this paper paves the way for further exploration into deep learning frameworks tailored for not missing-completely-at-random (NMAR) data.

Additionally, future work could involve more extensive theoretical analyses and experiments across other domains beyond healthcare, offering broader applicability and validation of the proposed model.

Conclusion

The research presented in Recurrent Neural Networks for Multivariate Time Series with Missing Values is a crucial step forward in addressing the challenges of missing data in time series analysis. By effectively integrating missing patterns into the recurrent neural network architecture through GRU-D, the authors have demonstrated substantial improvements in predictive accuracy. This work holds promising potential for future developments in artificial intelligence, especially in fields where timely and precise predictions are paramount.

PDF Markdown