GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition (2306.07848v10)

Published 13 Jun 2023 in cs.CL, cs.MM, cs.SD, and eess.AS

Abstract: Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER). In this paper, we propose GEmo-CLAP, a kind of gender-attribute-enhanced contrastive language-audio pretraining (CLAP) method for SER. Specifically, we first construct an effective emotion CLAP (Emo-CLAP) for SER, using pre-trained text and audio encoders. Second, given the significance of gender information in SER, two novel multi-task learning based GEmo-CLAP (ML-GEmo-CLAP) and soft label based GEmo-CLAP (SL-GEmo-CLAP) models are further proposed to incorporate gender information of speech signals, forming more reasonable objectives. Experiments on IEMOCAP indicate that our proposed two GEmo-CLAPs consistently outperform Emo-CLAP with different pre-trained models. Remarkably, the proposed WavLM-based SL-GEmo-CLAP obtains the best WAR of 83.16\%, which performs better than state-of-the-art SER methods.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (24)

Authors (8)

Yu Pan (154 papers)
Yanni Hu (8 papers)
Yuguang Yang (37 papers)
Wen Fei (4 papers)
Jixun Yao (35 papers)
Heng Lu (41 papers)
Lei Ma (195 papers)
Jianjun Zhao (63 papers)

Citations (6)

View on Semantic Scholar

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition (2306.07848v10)

Related Papers