Self-supervised ECG Representation Learning for Emotion Recognition (2002.03898v2)

Published 4 Feb 2020 in eess.SP, cs.LG, and stat.ML

Abstract: We exploit a self-supervised deep multi-task learning framework for electrocardiogram (ECG) -based emotion recognition. The proposed solution consists of two stages of learning a) learning ECG representations and b) learning to classify emotions. ECG representations are learned by a signal transformation recognition network. The network learns high-level abstract representations from unlabeled ECG data. Six different signal transformations are applied to the ECG signals, and transformation recognition is performed as pretext tasks. Training the model on pretext tasks helps the network learn spatiotemporal representations that generalize well across different datasets and different emotion categories. We transfer the weights of the self-supervised network to an emotion recognition network, where the convolutional layers are kept frozen and the dense layers are trained with labelled ECG data. We show that the proposed solution considerably improves the performance compared to a network trained using fully-supervised learning. New state-of-the-art results are set in classification of arousal, valence, affective states, and stress for the four utilized datasets. Extensive experiments are performed, providing interesting insights into the impact of using a multi-task self-supervised structure instead of a single-task model, as well as the optimum level of difficulty required for the pretext self-supervised tasks.

PDF Abstract

Self-supervised ECG Representation Learning for Emotion Recognition

The paper presented in "Self-supervised ECG Representation Learning for Emotion Recognition" explores the application of self-supervised learning (SSL) techniques to efficiently extract features from electrocardiogram (ECG) signals, focusing on emotion recognition tasks. Sarkar and Etemad propose a novel deep multitask learning framework to encode these biological signals and improve emotion classification accuracy without relying on large annotated datasets, addressing key limitations of fully-supervised methodologies.

Methodological Framework

The methodology is divided into two stages. In the first stage, the model leverages a signal transformation recognition network to learn abstract ECG representations from unlabeled data. This network employs six specific transformations: noise addition, scaling, temporal inversion, negation, permutation, and time-warping, which are used as self-supervision signals to teach the model variance-invariant features. Notably, each of these transformations has parameter ranges that significantly affect the model's ability to learn useful representations, and the paper provides an exhaustive analysis of these effects.

For the second stage, the authors freeze the convolutional layers trained in the first stage and fine-tune the dense layers with labeled ECG data, executing the emotion recognition task. The decision to freeze these layers is crucial as it reflects the weights themselves capture general features that are transferable across different tasks and datasets.

Key Insights and Results

The performance of the proposed architecture is assessed using four well-known datasets—AMIGOS, DREAMER, WESAD, and SWELL—known for emotional state categorization. The results indicate that the proposed self-supervised framework consistently outperforms traditional supervised models. For example, in cross-validation tests, the self-supervised model achieved significant gains in both accuracy and F1-score compared to fully-supervised CNN counterparts across all datasets and emotion recognition tasks, such as arousal, valence, stress, and various affect states classification.

Furthermore, the work achieves state-of-the-art performance against previous benchmarks, as demonstrated in comparative analyses with existing techniques. In particular, multi-class classifications of arousal and valence scores, attempted for the first time in some datasets, yielded strong accuracies, revealing the versatility and effectiveness of the model.

Implications and Future Work

The implications of this paper are twofold. Practically, it introduces a scalable approach for emotion recognition that minimizes reliance on costly labeled data and maximizes the utility of readily-available unlabeled recordings. Theoretically, it contributes to the understanding of how generative transformations can foster the creation of robust feature spaces suitable for classification tasks.

Future developments could see the merging of signals from other modalities, such as EEG, which share time-series characteristics with ECG, potentially enhancing the overall accuracy of emotion recognition systems. Moreover, expanding upon cross-subject and cross-corpus generalization could further cement the framework's applicability across diverse populations and settings.

Ultimately, Sarkar and Etemad's work sets a precedent for exploring self-supervised learning within affective computing, opening new possibilities for enhancing intelligent human-machine interaction through emotionally-aware systems.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Pritam Sarkar (14 papers)
Ali Etemad (118 papers)

Citations (230)

View on Semantic Scholar

Self-supervised ECG Representation Learning for Emotion Recognition (2002.03898v2)