Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adapting WavLM for Speech Emotion Recognition (2405.04485v1)

Published 7 May 2024 in cs.LG, cs.SD, and eess.AS

Abstract: Recently, the usage of speech self-supervised models (SSL) for downstream tasks has been drawing a lot of attention. While large pre-trained models commonly outperform smaller models trained from scratch, questions regarding the optimal fine-tuning strategies remain prevalent. In this paper, we explore the fine-tuning strategies of the WavLM Large model for the speech emotion recognition task on the MSP Podcast Corpus. More specifically, we perform a series of experiments focusing on using gender and semantic information from utterances. We then sum up our findings and describe the final model we used for submission to Speech Emotion Recognition Challenge 2024.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Daria Diatlova (3 papers)
  2. Anton Udalov (1 paper)
  3. Vitalii Shutov (2 papers)
  4. Egor Spirin (7 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.