Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio (2406.13982v1)

Published 20 Jun 2024 in cs.SD and eess.AS

Abstract: RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environments, the dataset inevitably suffers data imbalance in some acoustic properties, leading to subpar performance for the underrepresented data. The signal-to-noise ratio (SNR), inherently balanced in supervised learning, is a prime example. In this paper, we provide empirical evidence that the SNR of pseudo data has a significant impact on model performance using the dataset of the CHiME-7 UDASE task, highlighting the importance of balanced SNR in DASE. Furthermore, we propose adopting curriculum learning to encompass a broad range of SNRs to boost performance for underrepresented data.

Summary

We haven't generated a summary for this paper yet.