Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features (2112.05686v3)

Published 10 Dec 2021 in eess.AS and cs.SD

Abstract: Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed system, including speaker embeddings derived from user enroLLMent and a novel long-short-term spatial coherence feature pertaining to the target speaker activity. As a learning-based approach, a target speech sifting network was employed to extract the relevant features. The network trained with LSTSC in the proposed approach is robust to microphone array geometries and the number of microphones. Furthermore, the proposed enhancement system was compared with a baseline system with speaker embeddings and interchannel phase difference. The results demonstrated the superior performance of the proposed system over the baseline in enhancement performance and robustness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yicheng Hsu (12 papers)
  2. Yonghan Lee (16 papers)
  3. Mingsian R. Bai (10 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.