Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition (2203.16794v5)

Published 31 Mar 2022 in cs.CL, cs.SD, and eess.AS

Abstract: In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sreyan Ghosh (46 papers)
  2. Utkarsh Tyagi (18 papers)
  3. Harshvardhan Srivastava (8 papers)
  4. Dinesh Manocha (366 papers)
  5. S Ramaneswaran (6 papers)
Citations (15)