Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark (2406.09933v1)

Published 14 Jun 2024 in cs.SD, cs.AI, cs.HC, and cs.LG

Abstract: Speech emotion recognition (SER) is essential for enhancing human-computer interaction in speech-based applications. Despite improvements in specific emotional datasets, there is still a research gap in SER's capability to generalize across real-world situations. In this paper, we investigate approaches to generalize the SER system across different emotion datasets. In particular, we incorporate 11 emotional speech datasets and illustrate a comprehensive benchmark on the SER task. We also address the challenge of imbalanced data distribution using over-sampling methods when combining SER datasets for training. Furthermore, we explore various evaluation protocols for adeptness in the generalization of SER. Building on this, we explore the potential of Whisper for SER, emphasizing the importance of thorough evaluation. Our approach is designed to advance SER technology by integrating speaker-independent methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Adham Ibrahim (3 papers)
  2. Shady Shehata (17 papers)
  3. Ajinkya Kulkarni (18 papers)
  4. Mukhtar Mohamed (4 papers)
  5. Muhammad Abdul-Mageed (102 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com