MMER: Multimodal Multi-task Learning for Speech Emotion Recognition (2203.16794v5)

Published 31 Mar 2022 in cs.CL, cs.SD, and eess.AS

Abstract: In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Sreyan Ghosh (46 papers)
Utkarsh Tyagi (18 papers)
Harshvardhan Srivastava (8 papers)
Dinesh Manocha (366 papers)
S Ramaneswaran (6 papers)

Citations (15)

View on Semantic Scholar

GitHub

GitHub - Sreyan88/MMER: Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition (74 stars)

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition (2203.16794v5)

Related Papers

GitHub