Deep Learning for Sensor-based Activity Recognition: A Survey (1707.03502v2)

Published 12 Jul 2017 in cs.CV, cs.AI, cs.LG, and cs.NE

Abstract: Sensor-based activity recognition seeks the profound high-level knowledge about human activities from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, those methods often heavily rely on heuristic hand-crafted feature extraction, which could hinder their generalization performance. Additionally, existing methods are undermined for unsupervised and incremental learning tasks. Recently, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. This paper surveys the recent advance of deep learning based sensor-based activity recognition. We summarize existing literature from three aspects: sensor modality, deep model, and application. We also present detailed insights on existing work and propose grand challenges for future research.

Authors (5)

Jindong Wang (150 papers)
Yiqiang Chen (44 papers)
Shuji Hao (4 papers)
Xiaohui Peng (9 papers)
Lisha Hu (2 papers)

Citations (1,543)

View on Semantic Scholar

Summary

Deep Learning for Sensor-based Activity Recognition: A Survey

The paper "Deep Learning for Sensor-based Activity Recognition: A Survey" provides a comprehensive review of the methodologies and advancements in the use of deep learning for Human Activity Recognition (HAR). This summary aims to explore the surveyed aspects in detail, focusing on sensor modalities, deep learning models, applications, and challenges.

Sensor-based activity recognition extracts meaningful high-level knowledge about human activities from low-level sensor readings. Traditional pattern recognition (PR) approaches, such as decision trees and support vector machines, require extensive hand-crafted feature extraction and are limited in unsupervised and incremental learning scenarios. Deep learning, with its automatic feature extraction capabilities, offers robust performance improvements, motivating its application in HAR tasks. The paper discusses the intersection of deep learning with HAR and addresses the critical aspects of sensor modalities, deep learning models, applications, and benchmarking.

Sensor Modalities

The survey categorizes sensors used for HAR into four primary modalities: body-worn sensors, object sensors, ambient sensors, and hybrid sensors.

Body-worn Sensors: These sensors, which include accelerometers, gyroscopes, and magnetometers, are typically worn by users and are prevalent in HAR tasks. They capture body movements and are integrated into common devices like smartphones and smartwatches.
Object Sensors: Placed on objects, these sensors, such as RFID tags, track object movements to infer human activities. They are notably used in smart environments and medical settings.
Ambient Sensors: Embedded in the environment, these sensors reflect user interactions with their surroundings, encompassing technologies like sound sensors and Wi-Fi.
Hybrid Sensors: Combining multiple sensor types, these setups capture complex interactions between users and their environment, enhancing recognition capabilities in smart environments.

Deep Learning Models

The paper reviews various deep learning architectures employed in HAR tasks, identifying their strengths and application scenarios.

Deep Neural Networks (DNN): DNNs, with multiple hidden layers, are effective in learning complex feature representations from large datasets. They are particularly suited for multi-dimensional and intricate activities.
Convolutional Neural Networks (CNN): Leveraging sparse interactions, parameter sharing, and equivariant representations, CNNs excel in extracting features from temporal multi-dimensional sensor data. They are adept at recognizing activities with local dependencies and scale invariance.
Autoencoders: Autoencoders, including Stacked Autoencoders (SAE), perform unsupervised feature learning via an encoding-decoding mechanism. They are beneficial for scenarios with abundant unlabeled data.
Restricted Boltzmann Machines (RBM)/Deep Belief Networks (DBN): These generative models enable unsupervised pre-training of features, enhancing HAR tasks involving multi-modal sensor inputs.
Recurrent Neural Networks (RNN): Incorporating Long Short-Term Memory (LSTM) units, RNNs capture temporal correlations, making them suitable for sequential activity data.
Hybrid Models: Combinations of models, such as CNN-RNN hybrids, exploit both spatial and temporal relationships in sensor data, yielding superior performance in HAR tasks.

Applications

Deep learning-based HAR primarily targets Activities of Daily Living (ADL), but extends to specialized areas like health monitoring and smart environments.

ADL and Sports: Most surveyed works focus on recognizing common daily and sports activities using body-worn sensors.
Health and Disease Monitoring: Emerging applications include monitoring Parkinson's disease, trauma resuscitation, and paroxysmal atrial fibrillation (PAF), leveraging deep learning to analyze specific movement patterns linked to these conditions.
Context-aware and High-level Activity Recognition: Combining data from multiple sensor types, deep learning models can infer complex activities involving environmental and contextual interactions.

Benchmark Datasets

The paper lists several public datasets widely used in HAR research. These datasets vary in sensor types, activity types, and sample sizes, providing a robust foundation for benchmarking deep learning models.

Grand Challenges

The survey identifies the following grand challenges and proposes solutions:

Online and Mobile Deployment: Improving the feasibility of deep learning models for real-time, mobile applications through reduced communication costs and enhanced mobile computing capabilities.
Unsupervised Learning: Leveraging techniques like crowd-sourcing and transfer learning to overcome the dependency on labeled data.
High-level Activity Recognition: Enhancing models to recognize complex, context-aware activities through the integration of hybrid sensors and contextual information.
Light-weight Models: Developing efficient deep learning models suitable for deployment on resource-constrained devices, possibly integrating human-crafted features and shallow models.
Non-invasive Sensing: Employing non-invasive, opportunistic sensing approaches to reduce user disturbance during data collection.
Beyond Recognition: Extending the application of HAR to areas like skill assessment and smart home assistance.

Conclusion

This survey highlights the transformative potential of deep learning in sensor-based HAR. By addressing the limitations of traditional PR methods, deep learning facilitates more accurate, efficient, and versatile activity recognition. Future research should focus on overcoming current challenges, optimizing models for diverse applications, and leveraging the full capabilities of hybrid sensing and contextual data.

PDF Markdown