Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LanSER: Language-Model Supported Speech Emotion Recognition (2309.03978v1)

Published 7 Sep 2023 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained LLMs through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Taesik Gong (14 papers)
  2. Josh Belanich (2 papers)
  3. Krishna Somandepalli (21 papers)
  4. Arsha Nagrani (62 papers)
  5. Brian Eoff (4 papers)
  6. Brendan Jou (14 papers)
Citations (9)