Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search (2211.08237v2)

Published 31 Oct 2022 in cs.SD, cs.CL, and eess.AS

Abstract: Speech emotion recognition (SER) classifies audio into emotion categories such as Happy, Angry, Fear, Disgust and Neutral. While Speech Emotion Recognition (SER) is a common application for popular languages, it continues to be a problem for low-resourced languages, i.e., languages with no pretrained speech-to-text recognition models. This paper firstly proposes a language-specific model that extract emotional information from multiple pre-trained speech models, and then designs a multi-domain model that simultaneously performs SER for various languages. Our multidomain model employs a multi-gating mechanism to generate unique weighted feature combination for each language, and also searches for specific neural network structure for each language through a neural architecture search module. In addition, we introduce a contrastive auxiliary loss to build more separable representations for audio data. Our experiments show that our model raises the state-of-the-art accuracy by 3% for German and 14.3% for French.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zihan Wang (181 papers)
  2. Qi Meng (50 papers)
  3. HaiFeng Lan (1 paper)
  4. KeHao Guo (2 papers)
  5. Akshat Gupta (41 papers)
  6. Xinrui Zhang (13 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.