DepressionEmo: A novel dataset for multilabel classification of depression emotions (2401.04655v1)

Published 9 Jan 2024 in cs.CL

Abstract: Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce a novel dataset named DepressionEmo designed to detect 8 emotions associated with depression by 6037 examples of long Reddit user posts. This dataset was created through a majority vote over inputs by zero-shot classifications from pre-trained models and validating the quality by annotators and ChatGPT, exhibiting an acceptable level of interrater reliability between annotators. The correlation between emotions, their distribution over time, and linguistic analysis are conducted on DepressionEmo. Besides, we provide several text classification methods classified into two groups: machine learning methods such as SVM, XGBoost, and Light GBM; and deep learning methods such as BERT, GAN-BERT, and BART. The pretrained BART model, bart-base allows us to obtain the highest F1- Macro of 0.76, showing its outperformance compared to other methods evaluated in our analysis. Across all emotions, the highest F1-Macro value is achieved by suicide intent, indicating a certain value of our dataset in identifying emotions in individuals with depression symptoms through text analysis. The curated dataset is publicly available at: https://github.com/abuBakarSiddiqurRahman/DepressionEmo.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a novel dataset that classifies eight depression-related emotions using zero-shot annotation and majority voting.
It demonstrates superior performance with the BART model, achieving an F1-Macro score of 0.76 over other ML and DL techniques.
The study highlights the potential for improved automated depression detection through enhanced text classification of social media content.

Introduction

Emotions play a crucial role in human experiences and social interactions, often serving as indicators of mental health issues, such as depression. With a staggering number of individuals affected by depression worldwide, it is critical to address and understand the related emotional states. Advancements in NLP have paved the way for the utilization of text classification methods, particularly in the context of social media platforms, to automate the identification of depressive symptoms. This paper presents DepressionEmo, a new dataset designed for the multilabel classification of emotions associated with depression, which is derived from over six thousand long-form Reddit user posts.

Dataset Overview and Analysis

DepressionEmo distinguishes itself by identifying eight specific depression-related emotions within its compiled Reddit posts. The data was annotated using a zero-shot classification methodology based on inputs from pre-trained models, with majority voting used to solidify emotion labels. The dataset's annotation accuracy was scrutinized through reliability assessments between human annotators and with ChatGPT, showcasing substantial agreement reflected in strong reliability coefficients.

A series of analyses is conducted on the DepressionEmo dataset to gain insights into the prevalence and correlations of different emotions, their distribution over time, and their associations with various linguistic features. The dataset indicates prominence of emotions such as sadness and hopelessness, while pinpointing the suicide intent emotion as particularly well-represented in text-based classifications.

Methodological Approach and Classification Results

DepressionEmo was examined using both ML methods like SVM, XGBoost, and LightGBM, and deep learning (DL) frameworks including BERT, GAN-BERT, and BART. The comparison of these methods in the task of emotion detection within text content resulted in the highest F1-Macro score of 0.76 using the BART model, demonstrating its superior performance over both traditional ML and other DL techniques.

Summary of Conclusions and Future Work

DepressionEmo emerges as a valuable resource for those seeking to apply text classification methods to detect depression manifestations in social media text. It underscores the potential utility of ML and DL models, specifically the pre-trained BART model, for accurate multilabel emotion classification. While current results set a strong precedent, future enhancements could involve expanding the dataset, improving annotation precision, and exploring novel classification strategies. The ongoing development of tools such as DepressionEmo will no doubt contribute critically to the automated detection of depressive states through text analysis.

PDF Markdown

Related Papers

GitHub

GitHub - abuBakarSiddiqurRahman/DepressionEmo: DepressionEmo: A novel dataset for multilabel classification of depression emotions (22 stars)