Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model (2104.04195v1)

Published 9 Apr 2021 in eess.AS and cs.LG

Abstract: Speech based depression classification has gained immense popularity over the recent years. However, most of the classification studies have focused on binary classification to distinguish depressed subjects from non-depressed subjects. In this paper, we formulate the depression classification task as a severity level classification problem to provide more granularity to the classification outcomes. We use articulatory coordination features (ACFs) developed to capture the changes of neuromotor coordination that happens as a result of psychomotor slowing, a necessary feature of Major Depressive Disorder. The ACFs derived from the vocal tract variables (TVs) are used to train a dilated Convolutional Neural Network based depression classification model to obtain segment-level predictions. Then, we propose a Recurrent Neural Network based approach to obtain session-level predictions from segment-level predictions. We show that strengths of the segment-wise classifier are amplified when a session-wise classifier is trained on embeddings obtained from it. The model trained on ACFs derived from TVs show relative improvement of 27.47% in Unweighted Average Recall (UAR) at the session-level classification task, compared to the ACFs derived from Mel Frequency Cepstral Coefficients (MFCCs).

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (2)

Nadee Seneviratne (4 papers)
Carol Espy-Wilson (34 papers)

Citations (7)

View on Semantic Scholar

Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model (2104.04195v1)

Related Papers