Papers
Topics
Authors
Recent
2000 character limit reached

Multivariate Time Series Classification: A Deep Learning Approach

Published 5 Jul 2023 in cs.LG | (2307.02253v1)

Abstract: This paper investigates different methods and various neural network architectures applicable in the time series classification domain. The data is obtained from a fleet of gas sensors that measure and track quantities such as oxygen and sound. With the help of this data, we can detect events such as occupancy in a specific environment. At first, we analyze the time series data to understand the effect of different parameters, such as the sequence length, when training our models. These models employ Fully Convolutional Networks (FCN) and Long Short-Term Memory (LSTM) for supervised learning and Recurrent Autoencoders for semisupervised learning. Throughout this study, we spot the differences between these methods based on metrics such as precision and recall identifying which technique best suits this problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Hervé Abdi and Lynne J Williams “Principal component analysis” In Wiley interdisciplinary reviews: computational statistics 2.4 Wiley Online Library, 2010, pp. 433–459
  2. Ratnadip Adhikari and Ramesh K Agrawal “An introductory study on time series modeling and forecasting” In arXiv preprint arXiv:1302.6613, 2013
  3. “Optuna: A Next-generation Hyperparameter Optimization Framework” In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019
  4. “API design for machine learning software: experiences from the scikit-learn project” In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, pp. 108–122
  5. “Pearson correlation coefficient” In Noise reduction in speech processing Springer, 2009, pp. 1–4
  6. “The relationship between Precision-Recall and ROC curves” In Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233–240
  7. “Long short-term memory” In Supervised sequence labelling with recurrent neural networks Springer, 2012, pp. 37–45
  8. “Array programming with NumPy” In Nature 585.7825 Springer ScienceBusiness Media LLC, 2020, pp. 357–362 DOI: 10.1038/s41586-020-2649-2
  9. “Deep residual learning for image recognition” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
  10. Sepp Hochreiter “Untersuchungen zu dynamischen neuronalen Netzen” In Diploma, Technische Universität München 91.1, 1991
  11. “Long short-term memory” In Neural computation 9.8 MIT press, 1997, pp. 1735–1780
  12. Jeremy Howard “fastai” GitHub, https://github.com/fastai/fastai, 2018
  13. J.D. Hunter “Matplotlib: A 2D graphics environment” In Computing in Science & Engineering 9.3 IEEE COMPUTER SOC, 2007, pp. 90–95 DOI: 10.1109/MCSE.2007.55
  14. Plotly Technologies Inc. “Collaborative data science”, 2015 URL: https://plot.ly
  15. “Batch normalization: Accelerating deep network training by reducing internal covariate shift” In International conference on machine learning, 2015, pp. 448–456 pmlr
  16. “Inceptiontime: Finding alexnet for time series classification” In Data Mining and Knowledge Discovery 34.6 Springer, 2020, pp. 1936–1962
  17. Tung Kieu, Bin Yang and Christian S Jensen “Outlier detection for multidimensional time series using deep neural networks” In 2018 19th IEEE international conference on mobile data management (MDM), 2018, pp. 125–134 IEEE
  18. “Deep learning for extreme multi-label text classification” In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, 2017, pp. 115–124
  19. Jonathan Long, Evan Shelhamer and Trevor Darrell “Fully convolutional networks for semantic segmentation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440
  20. “Sgdr: Stochastic gradient descent with warm restarts” In arXiv preprint arXiv:1608.03983, 2016
  21. Naveen Sai Madiraju “Deep temporal clustering: Fully unsupervised learning of time-domain features”, 2018
  22. Wes McKinney “Data Structures for Statistical Computing in Python” In Proceedings of the 9th Python in Science Conference, 2010, pp. 56–61 DOI: 10.25080/Majora-92bf1922-00a
  23. Ignacio Oguiza “tsai - A state-of-the-art deep learning library for time series and sequential data”, Github, 2022 URL: https://github.com/timeseriesAI/tsai
  24. “PyTorch: An Imperative Style, High-Performance Deep Learning Library” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019, pp. 8024–8035 URL: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
  25. Xingqun Qi, Tianhui Wang and Jiaming Liu “Comparison of support vector machine and softmax classifiers in computer vision” In 2017 Second International Conference on Mechanical, Control and Computer Engineering (ICMCCE), 2017, pp. 151–155 IEEE
  26. “Study the influence of normalization/transformation process on the accuracy of supervised classification” In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 2020, pp. 729–735 IEEE
  27. Robert H Shumway, David S Stoffer and David S Stoffer “Time series analysis and its applications” Springer, 2000
  28. Jesper E Van Engelen and Holger H Hoos “A survey on semi-supervised learning” In Machine learning 109.2 Springer, 2020, pp. 373–440
  29. Vladimir Vapnik “The nature of statistical learning theory” Springer science & business media, 1999
  30. “Attention is all you need” In Advances in neural information processing systems 30, 2017
  31. Zhiguang Wang, Weizhong Yan and Tim Oates “Time series classification from scratch with deep neural networks: A strong baseline” In 2017 International joint conference on neural networks (IJCNN), 2017, pp. 1578–1585 IEEE
  32. Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto “On early stopping in gradient descent learning” In Constructive Approximation 26.2 Springer, 2007, pp. 289–315
Citations (1)

Summary

  • The paper demonstrates the effective use of deep learning models (FCN, LSTM, and recurrent autoencoders) for classifying multivariate time series from gas sensor data.
  • The paper employs FCN for extracting local and global features and LSTM for capturing long-term dependencies, achieving high precision and recall in event detection.
  • The paper leverages a semi-supervised approach with recurrent autoencoders, optimizing performance through hyperparameter tuning and reducing reliance on labeled data.

Multivariate Time Series Classification via Deep Learning

This paper explores the application of deep learning techniques for multivariate time series classification, focusing on data obtained from gas sensors. The study investigates Fully Convolutional Networks (FCN), Long Short-Term Memory (LSTM) networks, and Recurrent Autoencoders for detecting events such as occupancy and window openings in specific environments. The research emphasizes the impact of various parameters, including sequence length, on model training and performance.

Deep Learning Architectures for Time Series Analysis

The paper employs several deep learning architectures tailored for time series data.

Fully Convolutional Networks

FCNs are utilized to extract both local and global features from each input channel of the multivariate time series data. The architecture consists of multiple convolutional blocks, each including a convolutional layer, batch normalization, and ReLU activation. Global Average Pooling (GAP) is applied after the last convolutional block to reduce parameters. A key advantage of FCNs is their ability to handle variable-length time sequences. Figure 1

Figure 1: Fully Convolutional Network (FCN).

InceptionTime

InceptionTime, a state-of-the-art architecture known for high accuracy in time series classification, is also explored. This architecture is an ensemble of Inception Networks with residual blocks. Each residual block contains three Inception modules, with Global Average Pooling (GAP) applied after the second block. The core of each Inception module involves applying multiple filters with a stride of 1, creating a bottleneck layer that reduces dimensionality and model complexity. Figure 2

Figure 2: Top: InceptionTime Network, Bottom: Single Inception Module.

Long Short-Term Memory Networks

LSTMs are employed to capture long-term dependencies in the time series data. Unlike standard RNNs, LSTMs use a "memory" or "context" state to address the vanishing gradient problem. The LSTM cell includes input, forget, and output gates to control dependencies. The study uses a one-layer LSTM network in a supervised learning setup. Figure 3

Figure 3: Top: LSTM Cell, Bottom: LSTM Network.

Recurrent Autoencoders for Semi-Supervised Learning

To leverage unlabeled data, Recurrent Autoencoders are used in a semi-supervised learning approach. The autoencoder is trained to minimize reconstruction error using Mean Squared Error (MSE) on unlabeled data. The encoder component is then used with a shallow classifier, trained on labeled data, to reduce the number of trainable parameters. Figure 4

Figure 4: Semi-supervised Learning using a Recurrent Autoencoder and a Shallow Classifier.

Experimental Setup and Data Analysis

The dataset comprises 17 features, including pressure, temperature, sound, and gas concentrations, along with two classes: 'person' and 'window_open'. Data was collected from gas sensors, with labeled data from one device and unlabeled data from 740 sensors.

Data Cleaning and Preprocessing

The initial steps involve visualizing the data and cleaning it to handle missing values. Missing values are interpolated to maintain the time series frequency. The distribution of labels is analyzed, and labels are merged to create binary classes. Figure 5

Figure 5: Visualization of the labeled data.

Figure 6

Figure 6: Distribution of original labels in the labeled data.

Figure 7

Figure 7: Distribution of missing values in the labeled data.

Feature Reduction and Under-Sampling

Pearson correlation coefficient is used to reduce the number of features, selecting the most relevant ones for classification. Under-sampling is applied to address the imbalance in the dataset, ensuring more accurate metrics during model comparison. Figure 8

Figure 8: Correlation Matrix of features and classes.

Figure 9

Figure 9: Left: Distribution of unbalanced data set, Right: Distribution of unbalanced labels.

(Figure 10)

Figure 10: Left: Distribution of data set, Right: Distribution of labels [After applying Under Sampling].

Sequence Labeling and Normalization

The data is segmented into sequences, with labels assigned to each sequence based on different methods (first label, mean label, last label). The performance of FCN is compared using these different sequence labeling methods. Standard scalar and min-max scalar are also compared for data normalization.

Results and Discussion

The paper presents a comprehensive analysis of the performance of different architectures and techniques.

Under-Sampling and Feature Selection

The results indicate that using unbalanced data for training significantly increases training time compared to under-sampling, while achieving similar F1 scores. This suggests that under-sampling is an effective method for saving time. Also, reducing features does not negatively impact results.

Benchmarking and Hyperparameter Optimization

FCN, LSTM, and InceptionTime are benchmarked, revealing that while InceptionTime yields good results, its high parameter count and longer training time make it less suitable for small datasets. FCN and LSTM are then selected as the primary models for further experiments. Hyperparameter optimization using Optuna leads to refined FCN and LSTM models with optimized filter counts, hidden sizes, and dropout rates. The optimized FCN achieved a precision of 0.91, recall of 1.0, and F1 score of 0.95 for person detection, and a precision of 1.0, recall of 0.97, and F1 score of 0.98 for window detection.

Predictions Distribution and Feature Visualization

The distribution of predictions over time is visualized for FCN and LSTM. PCA is used to visualize the feature space for both models on a separate labeled test set. Figure 11

Figure 11: Top: A separate training set, Bottom: A separate test set.

Figure 12

Figure 12: Confusion matrices of FCN.

Figure 13

Figure 13: Confusion matrices of LSTM.

Figure 14

Figure 14: Distribution of predictions for FCN.

Figure 15

Figure 15: Distribution of predictions for LSTM.

Figure 16

Figure 16: PCA for FCN with labeled data.

Figure 17

Figure 17: PCA for LSTM with labeled data.

Figure 18

Figure 18: PCA for FCN with labeled and unlabeled data.

Encoder Classifier Analysis

The recurrent autoencoder is trained on a large amount of unlabeled data, and the trained encoder is used with a shallow classifier. Different latent space sizes are tested, and the results indicate that an embedding size of 10 provides a good balance between performance and parameter count. The encoder classifier's PCA visualization, when applied to the same unlabeled test set, follows the distribution of the feature space, contrasting with FCN. Figure 19

Figure 19: Distribution of predictions for encoder classifier with latent_size = 2.

Figure 20

Figure 20: Distribution of predictions for encoder classifier with latent_size = 10.

Figure 21

Figure 21: Distribution of predictions for encoder classifier with latent_size = 16.

Figure 22

Figure 22: Confusion matrices of encoder classifier with latent_size = 10.

Figure 23

Figure 23: PCA for encoder classifier with latent_size = 10 with labeled data.

Figure 24

Figure 24: PCA for for encoder classifier with latent_size = 10 with labeled and unlabeled data.

Figure 25

Figure 25: Smoothed distribution of predictions for encoder classifier with latent_size = 10.

Conclusion

The paper demonstrates the efficacy of deep learning approaches for time series classification using data from gas sensors. Both supervised and semi-supervised learning techniques are explored, with FCN and LSTM architectures showing strong performance. The semi-supervised approach, utilizing a recurrent autoencoder, enables the use of less labeled data by pre-training the encoder on unlabeled data. Key considerations for time series data, such as handling missing values, sequence length selection, and normalization, are discussed. The study also highlights the importance of analyzing the feature space and visualizing prediction distributions for better insights. Future research could explore self-supervised learning techniques using Transformers for potentially more robust results.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.