Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 105 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition (2412.05558v1)

Published 7 Dec 2024 in cs.SD, cs.AI, cs.CV, cs.MM, and eess.AS

Abstract: Speech emotion recognition (SER) remains a challenging yet crucial task due to the inherent complexity and diversity of human emotions. To address this problem, researchers attempt to fuse information from other modalities via multimodal learning. However, existing multimodal fusion techniques often overlook the intricacies of cross-modal interactions, resulting in suboptimal feature representations. In this paper, we propose WavFusion, a multimodal speech emotion recognition framework that addresses critical research problems in effective multimodal fusion, heterogeneity among modalities, and discriminative representation learning. By leveraging a gated cross-modal attention mechanism and multimodal homogeneous feature discrepancy learning, WavFusion demonstrates improved performance over existing state-of-the-art methods on benchmark datasets. Our work highlights the importance of capturing nuanced cross-modal interactions and learning discriminative representations for accurate multimodal SER. Experimental results on two benchmark datasets (IEMOCAP and MELD) demonstrate that WavFusion succeeds over the state-of-the-art strategies on emotion recognition.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube