Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions (2312.13556v1)

Published 21 Dec 2023 in cs.SD and eess.AS

Abstract: Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically, we use clean speech features extracted by the wav2vec-2.0 as the learning goal and train the distil wav2vec-2.0 to approximate the feature extraction ability of the original wav2vec-2.0 under noisy conditions. Furthermore, we leverage the multi-level knowledge of the original wav2vec-2.0 to supervise the single-level output of the distil wav2vec-2.0. We evaluate the effectiveness of our proposed method by conducting extensive experiments using five types of noise-contaminated speech on the IEMOCAP dataset, which show promising results compared to state-of-the-art models.

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions (2312.13556v1)

Summary

Related Papers