Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

L-SpEx: Localized Target Speaker Extraction (2202.09995v1)

Published 21 Feb 2022 in eess.AS and cs.SD

Abstract: Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance. Recent studies show that speaker extraction benefits from the location or direction of the target speaker. However, these studies assume that the target speaker's location is known in advance or detected by an extra visual cue, e.g., face image or video. In this paper, we propose an end-to-end localized target speaker extraction on pure speech cues, that is called L-SpEx. Specifically, we design a speaker localizer driven by the target speaker's embedding to extract the spatial features, including direction-of-arrival (DOA) of the target speaker and beamforming output. Then, the spatial cues and target speaker's embedding are both used to form a top-down auditory attention to the target speaker. Experiments on the multi-channel reverberant dataset called MC-Libri2Mix show that our L-SpEx approach significantly outperforms the baseline system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Meng Ge (29 papers)
  2. Chenglin Xu (14 papers)
  3. Longbiao Wang (46 papers)
  4. Eng Siong Chng (112 papers)
  5. Jianwu Dang (41 papers)
  6. Haizhou Li (286 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.