Papers
Topics
Authors
Recent
Search
2000 character limit reached

MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition with Multi-Domain Deep Learning Model

Published 20 Sep 2020 in cs.CV | (2009.09404v2)

Abstract: Together with the rapid development of the Internet of Things (IoT), human activity recognition (HAR) using wearable Inertial Measurement Units (IMUs) becomes a promising technology for many research areas. Recently, deep learning-based methods pave a new way of understanding and performing analysis of the complex data in the HAR system. However, the performance of these methods is mostly based on the quality and quantity of the collected data. In this paper, we innovatively propose to build a large database based on virtual IMUs and then address technical issues by introducing a multiple-domain deep learning framework consisting of three technical parts. In the first part, we propose to learn the single-frame human activity from the noisy IMU data with hybrid convolutional neural networks (CNNs) in the semi-supervised form. For the second part, the extracted data features are fused according to the principle of uncertainty-aware consistency, which reduces the uncertainty by weighting the importance of the features. The transfer learning is performed in the last part based on the newly released Archive of Motion Capture as Surface Shapes (AMASS) dataset, containing abundant synthetic human poses, which enhances the variety and diversity of the training dataset and is beneficial for the process of training and feature transfer in the proposed method. The efficiency and effectiveness of the proposed method have been demonstrated in the real deep inertial poser (DIP) dataset. The experimental results show that the proposed methods can surprisingly converge within a few iterations and outperform all competing methods.

Citations (24)

Summary

  • The paper proposes a multi-domain deep learning framework that integrates virtual and real IMU data for enhanced human activity recognition.
  • It employs hybrid 2D and 1D CNNs combined with uncertainty-aware feature fusion to extract and merge spatio-temporal features.
  • The framework utilizes transfer learning from a large synthetic dataset to significantly improve accuracy and convergence on real-world tasks.

MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition with Multi-Domain Deep Learning Model

This paper addresses the challenges of human activity recognition (HAR) using wearable inertial measurement units (IMUs) by proposing a novel multi-domain deep learning framework. The method leverages a large database of virtual IMUs to overcome limitations in the quantity and quality of real-world data. The framework integrates hybrid convolutional neural networks (CNNs), uncertainty-aware feature fusion, and transfer learning to achieve robust and efficient HAR.

Data Synthesis and Preparation

The authors address the data scarcity issue by constructing an extensive database of virtual IMU data based on the Archive of Motion Capture as Surface Shapes (AMASS) dataset (Kumar et al., 2019) and the Deep Inertial Poser (DIP) dataset. The AMASS dataset, containing a diverse range of human poses, is converted into a SMPL body model, and virtual IMU data is generated by simulating sensor placement on the model's surface. This process involves forward kinematics to calculate the orientation and acceleration data of the virtual IMUs. Data labeling is performed through text description division, common motion category screening, and visual auxiliary annotation, resulting in a dataset of 12 HAR categories. Figure 1

Figure 1: IMU output and SMPL body model.

Figure 2

Figure 2: Data labeling flowchart.

Figure 3

Figure 3: SMPL models with Unity visual action.

Multi-Domain Deep Learning Framework

The proposed framework consists of three key modules: feature extraction, feature fusion, and cross-domain knowledge transfer. The feature extraction module employs hybrid CNNs (both 2D and 1D) to learn spatio-temporal features from the IMU data. The 2D-CNNs capture connectivity patterns among multi-modal signals, while the 1D-CNNs identify characteristics in individual channels. Figure 4

Figure 4: Proposed overview.

Uncertainty-Aware Feature Fusion

To effectively combine the features extracted from different domains, the authors propose an uncertainty-aware consistency approach. This method fuses latent representations obtained from five single-domain feature extractors, weighting the importance of each feature based on its uncertainty. Three fusion strategies are explored: feature-level fusion, decision-level fusion, and decision-level fusion with fairness consideration. The decision-level fusion strategy utilizes a sigmoid function to determine attention weights for each merging feature. To enhance fairness, the authors minimize the Kullback-Leibler (KL) divergence between the distributions of different features, ensuring that they are sampled from similar distributions. The symmetrized version of the KL divergence is used:

SDKL(P,Q)=DKL(P,Q)+DKL(Q,P)SD_{KL}(P,Q) = D_{KL}(P,Q) + D_{KL}(Q,P) Figure 5

Figure 5: Illustration of the proposed feature fusion strategies.

Cross-Domain Knowledge Transfer

To leverage the synthetic IMU data for real-world HAR, the framework incorporates transfer learning. The network is first pre-trained on the AMASS dataset and then fine-tuned on the real DIP dataset. This approach allows the model to transfer knowledge from the synthetic domain to the real domain, improving its performance and generalization ability.

Experimental Evaluation and Results

The authors evaluate the performance of the proposed method on the DIP dataset, comparing it against several competing HAR algorithms, including Random Forest (RF), LSTM, Deep ConvLSTM, and a deep convolutional autoencoder-based network. The experimental results demonstrate that the proposed methods outperform all competing approaches, achieving high accuracy, precision, and F1-score. The results also show that the proposed methods can converge surprisingly fast. For example, the MARS-v2 and MARS-v3 converge after only 4 iterations when fine-tuned with the AMASS dataset pre-training model. Furthermore, the use of the AMASS dataset for pre-training improves the performance of all deep learning-based HAR algorithms. The confusion matrices illustrate the classification performance of the proposed methods, highlighting the impact of fine-tuning on the recognition accuracy of different activities. Figure 6

Figure 6: Convergence performance comparison.

Figure 7

Figure 7

Figure 7

Figure 7: Confusion matrices illustration in the second case (6 IMUs).

Figure 8

Figure 8: HAR accuracy comparison using different numbers of IMUs.

Figure 9

Figure 9: Human activity recognition accuracy comparison histogram.

Impact of Sensor Number

The authors also investigate the effect of sensor number on HAR performance. The results show that increasing the number of IMUs generally improves accuracy. However, the placement of the sensors is also crucial. The HAR accuracies of all three proposed methods with three IMUs achieves that with using 6 IMUs. This can be explained by noticing the fact that AMASS contains abundant action features, which encourages effective feature extraction.

Conclusion

The authors present a comprehensive framework for HAR that addresses the challenges of data scarcity and domain adaptation. By leveraging virtual IMU data, hybrid CNNs, uncertainty-aware feature fusion, and transfer learning, the proposed method achieves state-of-the-art performance on real-world HAR tasks. The experimental results demonstrate the effectiveness of the framework and its potential for practical applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.