Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition (2008.06667v1)

Published 15 Aug 2020 in eess.AS and cs.SD

Abstract: Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional information from such weakly labeled emotion corpora, we propose using multiple instance learning (MIL) to learn segment embeddings in a weakly supervised manner. Also, for a sufficiently long utterance, not all of the segments contain relevant emotional information. In this regard, three attention-based neural network models are then applied to the learned segment embeddings to attend the most salient part of a speech utterance. Experiments on the CASIA corpus and the IEMOCAP database show better or highly competitive results than other state-of-the-art approaches.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shuiyang Mao (4 papers)
  2. P. C. Ching (16 papers)
  3. C. -C. Jay Kuo (177 papers)
  4. Tan Lee (70 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.