Multivariate Time Series Classification: A Deep Learning Approach

Published 5 Jul 2023 in cs.LG | (2307.02253v1)

Abstract: This paper investigates different methods and various neural network architectures applicable in the time series classification domain. The data is obtained from a fleet of gas sensors that measure and track quantities such as oxygen and sound. With the help of this data, we can detect events such as occupancy in a specific environment. At first, we analyze the time series data to understand the effect of different parameters, such as the sequence length, when training our models. These models employ Fully Convolutional Networks (FCN) and Long Short-Term Memory (LSTM) for supervised learning and Recurrent Autoencoders for semisupervised learning. Throughout this study, we spot the differences between these methods based on metrics such as precision and recall identifying which technique best suits this problem.

Abstract PDF HTML Upgrade to Chat

References (32)

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates the effective use of deep learning models (FCN, LSTM, and recurrent autoencoders) for classifying multivariate time series from gas sensor data.
The paper employs FCN for extracting local and global features and LSTM for capturing long-term dependencies, achieving high precision and recall in event detection.
The paper leverages a semi-supervised approach with recurrent autoencoders, optimizing performance through hyperparameter tuning and reducing reliance on labeled data.

Multivariate Time Series Classification via Deep Learning

This paper explores the application of deep learning techniques for multivariate time series classification, focusing on data obtained from gas sensors. The study investigates Fully Convolutional Networks (FCN), Long Short-Term Memory (LSTM) networks, and Recurrent Autoencoders for detecting events such as occupancy and window openings in specific environments. The research emphasizes the impact of various parameters, including sequence length, on model training and performance.

Deep Learning Architectures for Time Series Analysis

The paper employs several deep learning architectures tailored for time series data.

Fully Convolutional Networks

FCNs are utilized to extract both local and global features from each input channel of the multivariate time series data. The architecture consists of multiple convolutional blocks, each including a convolutional layer, batch normalization, and ReLU activation. Global Average Pooling (GAP) is applied after the last convolutional block to reduce parameters. A key advantage of FCNs is their ability to handle variable-length time sequences.

Figure 1: Fully Convolutional Network (FCN).

InceptionTime

InceptionTime, a state-of-the-art architecture known for high accuracy in time series classification, is also explored. This architecture is an ensemble of Inception Networks with residual blocks. Each residual block contains three Inception modules, with Global Average Pooling (GAP) applied after the second block. The core of each Inception module involves applying multiple filters with a stride of 1, creating a bottleneck layer that reduces dimensionality and model complexity.

Figure 2: Top: InceptionTime Network, Bottom: Single Inception Module.

Long Short-Term Memory Networks

LSTMs are employed to capture long-term dependencies in the time series data. Unlike standard RNNs, LSTMs use a "memory" or "context" state to address the vanishing gradient problem. The LSTM cell includes input, forget, and output gates to control dependencies. The study uses a one-layer LSTM network in a supervised learning setup.

Figure 3: Top: LSTM Cell, Bottom: LSTM Network.

Recurrent Autoencoders for Semi-Supervised Learning

To leverage unlabeled data, Recurrent Autoencoders are used in a semi-supervised learning approach. The autoencoder is trained to minimize reconstruction error using Mean Squared Error (MSE) on unlabeled data. The encoder component is then used with a shallow classifier, trained on labeled data, to reduce the number of trainable parameters.

Figure 4: Semi-supervised Learning using a Recurrent Autoencoder and a Shallow Classifier.

Experimental Setup and Data Analysis

The dataset comprises 17 features, including pressure, temperature, sound, and gas concentrations, along with two classes: 'person' and 'window_open'. Data was collected from gas sensors, with labeled data from one device and unlabeled data from 740 sensors.

Data Cleaning and Preprocessing

The initial steps involve visualizing the data and cleaning it to handle missing values. Missing values are interpolated to maintain the time series frequency. The distribution of labels is analyzed, and labels are merged to create binary classes.

Figure 5: Visualization of the labeled data.

Figure 6: Distribution of original labels in the labeled data.

Figure 7: Distribution of missing values in the labeled data.

Feature Reduction and Under-Sampling

Pearson correlation coefficient is used to reduce the number of features, selecting the most relevant ones for classification. Under-sampling is applied to address the imbalance in the dataset, ensuring more accurate metrics during model comparison.

Figure 8: Correlation Matrix of features and classes.

Figure 9: Left: Distribution of unbalanced data set, Right: Distribution of unbalanced labels.

(Figure 10)

Figure 10: Left: Distribution of data set, Right: Distribution of labels [After applying Under Sampling].

Sequence Labeling and Normalization

The data is segmented into sequences, with labels assigned to each sequence based on different methods (first label, mean label, last label). The performance of FCN is compared using these different sequence labeling methods. Standard scalar and min-max scalar are also compared for data normalization.

Results and Discussion

The paper presents a comprehensive analysis of the performance of different architectures and techniques.

Under-Sampling and Feature Selection

The results indicate that using unbalanced data for training significantly increases training time compared to under-sampling, while achieving similar F1 scores. This suggests that under-sampling is an effective method for saving time. Also, reducing features does not negatively impact results.

Benchmarking and Hyperparameter Optimization

FCN, LSTM, and InceptionTime are benchmarked, revealing that while InceptionTime yields good results, its high parameter count and longer training time make it less suitable for small datasets. FCN and LSTM are then selected as the primary models for further experiments. Hyperparameter optimization using Optuna leads to refined FCN and LSTM models with optimized filter counts, hidden sizes, and dropout rates. The optimized FCN achieved a precision of 0.91, recall of 1.0, and F1 score of 0.95 for person detection, and a precision of 1.0, recall of 0.97, and F1 score of 0.98 for window detection.

Predictions Distribution and Feature Visualization

The distribution of predictions over time is visualized for FCN and LSTM. PCA is used to visualize the feature space for both models on a separate labeled test set.

Figure 11: Top: A separate training set, Bottom: A separate test set.

Figure 12: Confusion matrices of FCN.

Figure 13: Confusion matrices of LSTM.

Figure 14: Distribution of predictions for FCN.

Figure 15: Distribution of predictions for LSTM.

Figure 16: PCA for FCN with labeled data.

Figure 17: PCA for LSTM with labeled data.

Figure 18: PCA for FCN with labeled and unlabeled data.

Encoder Classifier Analysis

The recurrent autoencoder is trained on a large amount of unlabeled data, and the trained encoder is used with a shallow classifier. Different latent space sizes are tested, and the results indicate that an embedding size of 10 provides a good balance between performance and parameter count. The encoder classifier's PCA visualization, when applied to the same unlabeled test set, follows the distribution of the feature space, contrasting with FCN.

Figure 19: Distribution of predictions for encoder classifier with latent_size = 2.

Figure 20: Distribution of predictions for encoder classifier with latent_size = 10.

Figure 21: Distribution of predictions for encoder classifier with latent_size = 16.

Figure 22: Confusion matrices of encoder classifier with latent_size = 10.

Figure 23: PCA for encoder classifier with latent_size = 10 with labeled data.

Figure 24: PCA for for encoder classifier with latent_size = 10 with labeled and unlabeled data.

Figure 25: Smoothed distribution of predictions for encoder classifier with latent_size = 10.

Conclusion

The paper demonstrates the efficacy of deep learning approaches for time series classification using data from gas sensors. Both supervised and semi-supervised learning techniques are explored, with FCN and LSTM architectures showing strong performance. The semi-supervised approach, utilizing a recurrent autoencoder, enables the use of less labeled data by pre-training the encoder on unlabeled data. Key considerations for time series data, such as handling missing values, sequence length selection, and normalization, are discussed. The study also highlights the importance of analyzing the feature space and visualizing prediction distributions for better insights. Future research could explore self-supervised learning techniques using Transformers for potentially more robust results.

Markdown