Deep ConvLSTM with self-attention for human activity decoding using wearables (2005.00698v2)

Published 2 May 2020 in cs.HC, cs.LG, and eess.SP

Abstract: Decoding human activity accurately from wearable sensors can aid in applications related to healthcare and context awareness. The present approaches in this domain use recurrent and/or convolutional models to capture the spatio-temporal features from time-series data from multiple sensors. We propose a deep neural network architecture that not only captures the spatio-temporal features of multiple sensor time-series data but also selects, learns important time points by utilizing a self-attention mechanism. We show the validity of the proposed approach across different data sampling strategies on six public datasets and demonstrate that the self-attention mechanism gave a significant improvement in performance over deep networks using a combination of recurrent and convolution networks. We also show that the proposed approach gave a statistically significant performance enhancement over previous state-of-the-art methods for the tested datasets. The proposed methods open avenues for better decoding of human activity from multiple body sensors over extended periods of time. The code implementation for the proposed model is available at https://github.com/isukrit/encodingHumanActivity.

Citations (105)

View on Semantic Scholar

Summary

The paper introduces a novel deep learning model that integrates CNN, LSTM, and self-attention to extract spatio-temporal features from wearable sensor data.
The proposed architecture leverages a 1D CNN embedding, LSTM encoder, and self-attention layer to decode complex sensor signals, validated across six benchmark datasets with significant performance gains.
Future research may explore alternative attention mechanisms and hyperparameter optimization to further advance human activity recognition in healthcare and lifestyle monitoring.

Deep ConvLSTM with Self-attention for Human Activity Decoding Using Wearable Sensors

The paper "Deep ConvLSTM with Self-attention for Human Activity Decoding Using Wearable Sensors" presents a novel deep learning architecture aimed at improving human activity recognition using wearable sensors. The authors propose a model integrating convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and self-attention mechanisms to effectively decode spatio-temporal features from time-series data collected by multiple sensors.

Technical Overview

The core of the proposed architecture consists of three major components:

Embedding Layer: Utilizes 1-dimensional convolution filters to learn local contextual features from sensor inputs.
LSTM Encoder: Captures dependencies among time points, extracting temporal dynamics from sensor data.
Self-Attention Layer: Facilitates the discovery of latent relationships between sensor data at different time points, enhancing the feature representation learned by the CNN and LSTM layers.

The model concludes with a SoftMax layer for classification, which determines the activity label by identifying the neuron with the highest output.

Experimental Results

The efficacy of the model was demonstrated across six benchmark datasets, including MHEALTH, USC-HAD, UTD-MHAD, WHARF, and WISDM. The authors employed different sample generation methods, such as semi non-overlapping (SNOW) and fully non-overlapping windows (FNOW), alongside a leave-one-trial-out (LOTO) strategy to generate samples. Among these methods, LOTO exhibited no overlap in data samples, providing high accuracy and low variance, thereby forming the basis for principal comparisons.

Significant statistical improvements were observed when compared to the baseline ConvLSTM approach. For most datasets, the proposed model, incorporating a self-attention mechanism, showed notable improvements in accuracy, recall, and F1-Score. Specifically, for the UTD-MHAD1 dataset, the enhancement was particularly prominent with a 16% increase in accuracy, and substantial gains in recall and F1-Score.

Comparative Analysis

The paper also compares the proposed architecture with previous state-of-the-art methods, including those utilizing handcrafted features with ensemble learning, and automatic feature extraction approaches using CNNs and ConvLSTMs. The proposed method consistently outperformed these techniques, demonstrating a statistically significant superiority in accuracy for all datasets except WISDM, where the differences were less pronounced.

Implications and Future Directions

This research offers valuable implications for the field of human activity recognition utilizing wearable sensors. The integration of self-attention mechanisms allows more nuanced identification of relevant signals, potentially aiding applications in healthcare and lifestyle monitoring where precise activity decoding is crucial. The model's scalability to accommodate a larger number of sensors and time points provides further avenues for exploration.

Future work may involve experimenting with alternative attention mechanisms such as global and local attention, which could reveal additional insights into activity recognition. Further enhancements in architecture and hyperparameter optimization could continue refining the model's performance across varying sensor datasets and sampling strategies.

This paper constitutes a significant contribution to the ongoing advancement in applying deep learning methodologies to wearable sensor data for accurate human activity recognition, paving the way for enhanced understanding and monitoring of human behavior in diverse contexts.

PDF Markdown

Related Papers

GitHub

GitHub - isukrit/encodingHumanActivity: Encoding human activity by considering salient sensors and time points. (40 stars)