Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model (2202.08210v1)

Published 15 Feb 2022 in eess.AS, cs.AI, cs.SD, and q-bio.QM

Abstract: Depression is a global mental health problem, the worst case of which can lead to suicide. An automatic depression detection system provides great help in facilitating depression self-assessment and improving diagnostic accuracy. In this work, we propose a novel depression detection approach utilizing speech characteristics and linguistic contents from participants' interviews. In addition, we establish an Emotional Audio-Textual Depression Corpus (EATD-Corpus) which contains audios and extracted transcripts of responses from depressed and non-depressed volunteers. To the best of our knowledge, EATD-Corpus is the first and only public depression dataset that contains audio and text data in Chinese. Evaluated on two depression datasets, the proposed method achieves the state-of-the-art performances. The outperforming results demonstrate the effectiveness and generalization ability of the proposed method. The source code and EATD-Corpus are available at https://github.com/speechandlanguageprocessing/ICASSP2022-Depression.

Citations (72)

Summary

  • The paper introduces the EATD-Corpus, the first Chinese multimedia dataset combining audio and text for depression detection.
  • It proposes a multi-modal GRU/BiLSTM model enhanced by an attention mechanism to effectively fuse audio and textual features.
  • The method outperforms previous approaches with an F1 score of 0.85 on DAIC-WoZ and 0.71 on the EATD-Corpus, highlighting its diagnostic potential.

Analysis of "Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model"

The paper "Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model" addresses the significant challenge of accurately detecting depression using an innovative approach combining speech and text analysis. The work is based on the premise that depression, a prevalent mental health disorder, can be automatically detected using computational techniques, thereby facilitating self-assessment and improving diagnostic accuracy.

Key Contributions

The paper presents two primary contributions to the field of automated depression detection:

  1. EATD-Corpus Development: The introduction of the Emotional Audio-Textual Depression Corpus (EATD-Corpus) represents a noteworthy advancement. This dataset comprises audio recordings and corresponding transcripts of interviews conducted with 162 volunteers, including both depressed and non-depressed participants. It is notable as the first publicly available dataset of its kind in Chinese, addressing the scarcity of comprehensive, public multimedia depression datasets. The corpus is intended to support further research and development in the field of depression detection.
  2. Multi-modal Detection Model: Leveraging both audio and text features, the authors propose a model utilizing Gate Recurrent Units (GRU) and Bidirectional Long Short-Term Memory (BiLSTM) networks. These models, enhanced by an attention mechanism, allow for effective summarization of audio and textual data, which is then fused in a multi-modal network to predict depression. This approach demonstrates a significant capability in detecting depression, as evidenced by robust evaluation metrics presented in the research.

Evaluation and Results

The proposed method is evaluated on both the EATD-Corpus and the DAIC-WoZ dataset, showcasing state-of-the-art performance. On the DAIC-WoZ dataset, the proposed multi-modal model achieved an F1 score of 0.85, surpassing existing methodologies that rely solely on either audio or text data. Similarly, on the EATD-Corpus, the model exhibited significant precision and recall, with an F1 score reaching 0.71. It is noteworthy that resampling techniques were employed to address class imbalance issues in both datasets, ensuring the reliability of the results.

Implications and Future Directions

The implications of this research are twofold. Practically, the development of an automatic depression detection system could aid individuals in privately assessing their mental health, potentially increasing their willingness to engage with mental health professionals. Theoretically, it provides a foundation for further exploration in fusing multi-modal data for mental health diagnostics. The EATD-Corpus is expected to facilitate ongoing research in this area, offering a valuable resource for scientists aiming to develop more sophisticated diagnostic tools.

Looking forward, this paper paves the way for further advancements in multi-modal machine learning applications in mental health. Future research could explore the integration of additional data modalities, such as physiological signals or visual cues, to enhance detection accuracy. Moreover, developing systems optimized for real-world usage, such as mobile applications for self-diagnosis, could bridge the gap between theoretical research and practical application.

The work presented is a significant step towards enhancing depression diagnostics through computational means. It demonstrates not only the potential of machine learning in health diagnostics but also highlights the importance of publicly available datasets in driving innovation in this vital field.