Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention (2002.05873v1)

Published 14 Feb 2020 in eess.AS, cs.LG, cs.SD, and stat.ML

Abstract: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is whether a DNN for speech enhancement can be adopted to unknown speakers without any auxiliary guidance signal in test-phase. To achieve this, we adopt multi-task learning of speech enhancement and speaker identification, and use the output of the final hidden layer of speaker identification branch as an auxiliary feature. In addition, we use multi-head self-attention for capturing long-term dependencies in the speech and noise. Experimental results on a public dataset show that our strategy achieves the state-of-the-art performance and also outperform conventional methods in terms of subjective quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yuma Koizumi (39 papers)
  2. Kohei Yatabe (39 papers)
  3. Marc Delcroix (94 papers)
  4. Yoshiki Masuyama (30 papers)
  5. Daiki Takeuchi (30 papers)
Citations (121)

Summary

We haven't generated a summary for this paper yet.