Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speaker activity driven neural speech extraction (2101.05516v2)

Published 14 Jan 2021 in eess.AS and cs.SD

Abstract: Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enroLLMent utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel neural network-based speech extraction. We propose a speaker activity driven speech extraction neural network (ADEnet) and show that it can achieve performance levels competitive with enroLLMent-based approaches, without the need for pre-recordings. We further demonstrate the potential of the proposed approach for processing meeting-like recordings, where the speaker activity is obtained from a diarization system. We show that this simple yet practical approach can successfully extract speakers after diarization, which results in improved ASR performance, especially in high overlapping conditions, with a relative word error rate reduction of up to 25%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Marc Delcroix (94 papers)
  2. Tsubasa Ochiai (43 papers)
  3. Keisuke Kinoshita (44 papers)
  4. Tomohiro Nakatani (50 papers)
  5. Katerina Zmolikova (11 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.