Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information (2306.16241v2)
Abstract: Previously, Target Speaker Extraction (TSE) has yielded outstanding performance in certain application scenarios for speech enhancement and source separation. However, obtaining auxiliary speaker-related information is still challenging in noisy environments with significant reverberation. inspired by the recently proposed distance-based sound separation, we propose the near sound (NS) extractor, which leverages distance information for TSE to reliably extract speaker information without requiring previous speaker enrolment, called speaker embedding self-enroLLMent (SESE). Full- & sub-band modeling is introduced to enhance our NS-Extractor's adaptability towards environments with significant reverberation. Experimental results on several cross-datasets demonstrate the effectiveness of our improvements and the excellent performance of our proposed NS-Extractor in different application scenarios.
- Jiuxin Lin (5 papers)
- Peng Wang (832 papers)
- Heinrich Dinkel (29 papers)
- Jun Chen (374 papers)
- Zhiyong Wu (171 papers)
- Zhiyong Yan (16 papers)
- Yongqing Wang (29 papers)
- Junbo Zhang (84 papers)
- Yujun Wang (61 papers)