Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Channel Speaker Verification for Single and Multi-talker Speech (2010.12692v2)

Published 23 Oct 2020 in eess.AS

Abstract: To improve speaker verification in real scenarios with interference speakers, noise, and reverberation, we propose to bring together advancements made in multi-channel speech features. Specifically, we combine spectral, spatial, and directional features, which includes inter-channel phase difference, multi-channel sinc convolutions, directional power ratio features, and angle features. To maximally leverage supervised learning, our framework is also equipped with multi-channel speech enhancement and voice activity detection. On all simulated, replayed, and real recordings, we observe large and consistent improvements at various degradation levels. On real recordings of multi-talker speech, we achieve a 36% relative reduction in equal error rate w.r.t. single-channel baseline. We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean. Lastly, we investigate if the learned multi-channel speaker embedding space can be made more discriminative through a contrastive loss-based fine-tuning. With a simple choice of Triplet loss, we observe a further 8.3% relative reduction in EER.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Saurabh Kataria (23 papers)
  2. Shi-Xiong Zhang (48 papers)
  3. Dong Yu (329 papers)

Summary

We haven't generated a summary for this paper yet.