2000 character limit reached
ExARN: self-attending RNN for target speaker extraction (2212.01106v2)
Published 2 Dec 2022 in eess.AS and eess.SP
Abstract: Target speaker extraction is to extract the target speaker, specified by enroLLMent utterance, in an environment with other competing speakers. Therefore, the task needs to solve two problems, speaker identification and separation, at the same time. In this paper, we combine self-attention and Recurrent Neural Networks (RNN). Further, we exploit various ways to combining different auxiliary information with mixed representations. Experimental results show that our proposed model achieves excellent performance on the task of target speaker extraction.