Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge (2204.00400v2)

Published 1 Apr 2022 in cs.CL and cs.LG

Abstract: Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance -- and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers, while none of those linguistic features impact arousal or dominance. These findings show that transformers can successfully leverage linguistic information to improve their valence predictions, and that linguistic analysis should be included in their testing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Andreas Triantafyllopoulos (42 papers)
  2. Johannes Wagner (6 papers)
  3. Hagen Wierstorf (8 papers)
  4. Maximilian Schmitt (13 papers)
  5. Uwe Reichel (3 papers)
  6. Florian Eyben (14 papers)
  7. Felix Burkhardt (11 papers)
  8. Björn W. Schuller (153 papers)
Citations (22)

Summary

We haven't generated a summary for this paper yet.