Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech Emotion Recognition with Dual-Sequence LSTM Architecture (1910.08874v4)

Published 20 Oct 2019 in eess.AS, cs.LG, and cs.SD

Abstract: Speech Emotion Recognition (SER) has emerged as a critical component of the next generation human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%---a 6% improvement over current state-of-the-art unimodal models---and is comparable with multimodal models that leverage textual information as well as audio signals.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jianyou Wang (9 papers)
  2. Michael Xue (1 paper)
  3. Ryan Culhane (1 paper)
  4. Enmao Diao (25 papers)
  5. Jie Ding (123 papers)
  6. Vahid Tarokh (144 papers)
Citations (101)

Summary

We haven't generated a summary for this paper yet.