Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation (2306.16250v1)

Published 28 Jun 2023 in cs.SD and eess.AS

Abstract: The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new effective speaker extraction system with multi-scale interfusion and conditional speaker modulation (ConSM), which is called MC-SpEx. First of all, we design the weight-share multi-scale fusers (ScaleFusers) for efficiently leveraging multi-scale information as well as ensuring consistency of the model's feature space. Then, to consider different scale information while generating masks, the multi-scale interactive mask generator (ScaleInterMG) is presented. Moreover, we introduce ConSM module to fully exploit speaker embedding in the speech extractor. Experimental results on the Libri2Mix dataset demonstrate the effectiveness of our improvements and the state-of-the-art performance of our proposed MC-SpEx.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jun Chen (374 papers)
  2. Wei Rao (33 papers)
  3. Zilin Wang (30 papers)
  4. Jiuxin Lin (5 papers)
  5. Yukai Ju (6 papers)
  6. Shulin He (16 papers)
  7. Yannan Wang (23 papers)
  8. Zhiyong Wu (171 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.