DanHAR: Dual Attention Network For Multimodal Human Activity Recognition Using Wearable Sensors (2006.14435v4)

Published 25 Jun 2020 in cs.CV

Abstract: Human activity recognition (HAR) in ubiquitous computing has been beginning to incorporate attention into the context of deep neural networks (DNNs), in which the rich sensing data from multimodal sensors such as accelerometer and gyroscope is used to infer human activities. Recently, two attention methods are proposed via combining with Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) network, which can capture the dependencies of sensing signals in both spatial and temporal domains simultaneously. However, recurrent networks often have a weak feature representing power compared with convolutional neural networks (CNNs). On the other hand, two attention, i.e., hard attention and soft attention, are applied in temporal domains via combining with CNN, which pay more attention to the target activity from a long sequence. However, they can only tell where to focus and miss channel information, which plays an important role in deciding what to focus. As a result, they fail to address the spatial-temporal dependencies of multimodal sensing signals, compared with attention-based GRU or LSTM. In the paper, we propose a novel dual attention method called DanHAR, which introduces the framework of blending channel attention and temporal attention on a CNN, demonstrating superiority in improving the comprehensibility for multimodal HAR. Extensive experiments on four public HAR datasets and weakly labeled dataset show that DanHAR achieves state-of-the-art performance with negligible overhead of parameters. Furthermore, visualizing analysis is provided to show that our attention can amplifies more important sensor modalities and timesteps during classification, which agrees well with human common intuition.

Citations (121)

View on Semantic Scholar

Summary

The paper presents a dual attention model that integrates channel and temporal attention within CNNs to improve human activity recognition.
It achieves up to 98.85% accuracy on benchmark datasets like WISDM, outperforming traditional CNN and RNN methods.
The study demonstrates that the dual attention mechanism refines feature extraction with minimal computational overhead, benefiting applications in health and sports monitoring.

An Examination of the Dual Attention Network DanHAR for Multimodal Human Activity Recognition

The paper entitled "DanHAR: Dual Attention Network For Multimodal Human Activity Recognition Using Wearable Sensors" introduces a novel method to enhance the classification performance in Human Activity Recognition (HAR) through a dual attention mechanism integrated within Convolutional Neural Networks (CNNs). The research delineates the development and testing of a system that can effectively leverage multimodal sensor data, addressing both spatial and temporal dependencies typically encountered in HAR tasks.

Overview and Methodology

In its pursuit to improve HAR accuracy, DanHAR introduces two pivotal enhancements over existing models. Firstly, it incorporates channel attention — a mechanism that facilitates the network in deciding which sensor modalities hold significant information. Secondly, it utilizes temporal attention, which concentrates on important segments of the sensor data over time. These attention modules collaboratively enhance the CNN’s ability to prioritize pertinent data, thus refining the model's ability to classify activities effectively. The dual attention mechanism in DanHAR is superimposed on a residual network structure, further bolstering feature extraction capabilities without incurring substantial computational overhead.

Experimental Validation and Results

The robustness and accuracy of DanHAR are empirically validated across four public datasets: WISDM, PAMAP2, UNIMIB SHAR, and OPPORTUNITY. Additionally, a weakly labeled dataset is used to corroborate performance in scenarios where ground-truth annotations are sparse. The experimental results consistently highlight the superiority of DanHAR over traditional CNN and recurrent neural network-based models such as those employing LSTMs or GRUs. For instance, on the WISDM dataset, DanHAR achieved a noteworthy accuracy of 98.85%, surpassing previous benchmarks by a margin of 0.62%. Notably, these improvements are attained with minimal parameter overhead, underscoring the efficiency of the attention mechanism integration.

Theoretical and Practical Implications

This paper extends the understanding of how attention mechanisms can be adeptly utilized within deep learning architectures to process sensor data from multiple modalities. The inclusion of both channel and temporal attention underscores the importance of comprehensive spatial-temporal analysis — vital for accurate HAR. Practically, the improved recognition performance has direct implications in diverse applications such as health monitoring, sports tracking, and interactive entertainment systems. By enhancing the model’s capacity to autonomously determine data importance, DanHAR contributes significantly toward reducing the manual efforts required for data labeling, which is particularly beneficial in weakly supervised learning settings.

Future Developments

While DanHAR sets a new benchmark for multimodal HAR, it also opens avenues for future research. A potential direction involves extending the applicability of the dual attention mechanism to other domains where high-dimensional temporal data is prevalent, such as video and audio processing. Moreover, exploring mixed attention settings further could yield performance gains or even reduce the computational load of such systems. As HAR systems grow more complex and integral to various technology ecosystems, the impact of methodologies like DanHAR is poised to expand, driving further innovation in sensor-based recognition systems.

In conclusion, this paper makes a substantive contribution to the field of HAR by presenting a dual attention-based approach that balances performance and computational efficiency, setting a precedent for subsequent advancements in sensor data processing and interpretation using deep neural networks.

PDF Markdown