Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A knowledge-driven vowel-based approach of depression classification from speech using data augmentation (2210.15261v1)

Published 27 Oct 2022 in cs.SD, cs.LG, and eess.AS

Abstract: We propose a novel explainable ML model that identifies depression from speech, by modeling the temporal dependencies across utterances and utilizing the spectrotemporal information at the vowel level. Our method first models the variable-length utterances at the local-level into a fixed-size vowel-based embedding using a convolutional neural network with a spatial pyramid pooling layer ("vowel CNN"). Following that, the depression is classified at the global-level from a group of vowel CNN embeddings that serve as the input of another 1D CNN ("depression CNN"). Different data augmentation methods are designed for both the training of vowel CNN and depression CNN. We investigate the performance of the proposed system at various temporal granularities when modeling short, medium, and long analysis windows, corresponding to 10, 21, and 42 utterances, respectively. The proposed method reaches comparable performance with previous state-of-the-art approaches and depicts explainable properties with respect to the depression outcome. The findings from this work may benefit clinicians by providing additional intuitions during joint human-ML decision-making tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kexin Feng (13 papers)
  2. Theodora Chaspari (10 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.