Multi-task Self-Supervised Learning for Human Activity Detection (1907.11879v1)

Published 27 Jul 2019 in cs.LG and stat.ML

Abstract: Deep learning methods are successfully used in applications pertaining to ubiquitous computing, health, and well-being. Specifically, the area of human activity recognition (HAR) is primarily transformed by the convolutional and recurrent neural networks, thanks to their ability to learn semantic representations from raw input. However, to extract generalizable features, massive amounts of well-curated data are required, which is a notoriously challenging task; hindered by privacy issues, and annotation costs. Therefore, unsupervised representation learning is of prime importance to leverage the vast amount of unlabeled data produced by smart devices. In this work, we propose a novel self-supervised technique for feature learning from sensory data that does not require access to any form of semantic labels. We learn a multi-task temporal convolutional network to recognize transformations applied on an input signal. By exploiting these transformations, we demonstrate that simple auxiliary tasks of the binary classification result in a strong supervisory signal for extracting useful features for the downstream task. We extensively evaluate the proposed approach on several publicly available datasets for smartphone-based HAR in unsupervised, semi-supervised, and transfer learning settings. Our method achieves performance levels superior to or comparable with fully-supervised networks, and it performs significantly better than autoencoders. Notably, for the semi-supervised case, the self-supervised features substantially boost the detection rate by attaining a kappa score between 0.7-0.8 with only 10 labeled examples per class. We get similar impressive performance even if the features are transferred from a different data source. While this paper focuses on HAR as the application domain, the proposed technique is general and could be applied to a wide variety of problems in other areas.

PDF Abstract

Multi-task Self-Supervised Learning for Human Activity Detection

The paper by Saeed et al. presents a novel approach for self-supervised learning aimed at improving human activity recognition (HAR) using sensor data from smartphones. The central contribution of the work is a multi-task temporal convolutional network (CNN) that leverages self-supervision to recognize transformations applied to input signals, thus learning generalizable features without the need for labeled data. This technique drastically reduces the dependency on large, curated datasets, a significant hurdle in HAR applications.

Methodology

The authors propose a two-step learning process. First, a temporal CNN is pre-trained using self-supervised tasks which involve recognizing a set of transformations applied to the raw sensor data. Eight transformations are employed, including noise addition, scaling, rotation, negation, horizontal flipping, permutation, time-warping, and channel shuffling. These transformations act as surrogate tasks, providing a robust supervisory signal for the network. Unlike traditional autoencoders, this approach enables the extraction of more meaningful features by focusing on transformation recognition rather than data reconstruction.

In the second step, the pre-trained features are used to train a HAR model. Evaluation is done using several publicly available datasets, with the self-supervised features outperforming those learned via autoencoders and providing results on par with fully supervised techniques. The paper demonstrates that in semi-supervised and transfer learning settings, the self-supervised features offer significant improvements, narrowing the performance gap between unsupervised and fully supervised learning.

Evaluation and Results

The authors perform a comprehensive evaluation using six datasets for smartphone-based HAR, demonstrating the efficacy of their method across unsupervised, semi-supervised, and transfer learning paradigms. Notably, in semi-supervised conditions with as few as 2-10 labeled samples per class, the self-supervised models substantially improve the detection rate compared to models trained from scratch. This marks a practical advantage in real-world scenarios where labeled data is scarce.

Visualization techniques such as SVCCA, saliency mapping, and t-SNE are utilized to validate that representations learned through self-supervision are remarkably similar to those obtained in fully-supervised settings. Such analyses bolster the claim that the proposed method effectively captures high-level features useful for downstream tasks.

Implications and Future Directions

The significance of this research lies in its potential applicability beyond HAR to varied domains where massive unlabeled datasets are abundant but labeled data is hard to obtain. The method's ability to learn robust features for transfer and semi-supervised learning scenarios suggests opportunities in other fields requiring time-series data interpretation, such as health monitoring, industrial IoT, and smart home systems.

Future research could explore optimizing architecture designs that further exploit self-supervised pre-training, potentially incorporating domain-specific auxiliary tasks. Additionally, automating the identification of the most effective transformation tasks could yield further improvements. Real-world deployment and in-the-wild evaluations could provide deeper insights into computational and energy consumption optimizations necessary for practical applications.

Overall, the paper offers a compelling strategy for reducing reliance on labeled datasets while maintaining competitive performance in HAR, opening a path for more scalable and adaptable machine learning solutions.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Aaqib Saeed (36 papers)
Tanir Ozcelebi (14 papers)
Johan Lukkien (3 papers)

Citations (245)

View on Semantic Scholar

Multi-task Self-Supervised Learning for Human Activity Detection (1907.11879v1)