Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Role of Phonetic Units in Speech Emotion Recognition (2108.01132v1)

Published 2 Aug 2021 in cs.CL

Abstract: We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. Our method achieved a significant improvement over most previously reported results on IEMOCAP, a benchmark emotion dataset. Different types of phonetic units are employed and compared in terms of accuracy and robustness of emotion recognition within and across datasets and languages. Models of phonemes, broad phonetic classes, and syllables all significantly outperform the utterance model, demonstrating that phonetic units are helpful and should be incorporated in speech emotion recognition. The best performance is from using broad phonetic classes. Further research is needed to investigate the optimal set of broad phonetic classes for the task of emotion recognition. Finally, we found that Wav2vec 2.0 can be fine-tuned to recognize coarser-grained or larger phonetic units than phonemes, such as broad phonetic classes and syllables.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jiahong Yuan (12 papers)
  2. Xingyu Cai (10 papers)
  3. Renjie Zheng (29 papers)
  4. Liang Huang (108 papers)
  5. Kenneth Church (21 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.