Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification (1902.07821v1)

Published 21 Feb 2019 in cs.CL, cs.SD, and eess.AS

Abstract: This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural networks (LSTM) to generate complementary speaker information at different levels; (2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; (3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test (with a 19% EER reduction) and SRE 2018 dev test (with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yun Tang (42 papers)
  2. Guohong Ding (3 papers)
  3. Jing Huang (141 papers)
  4. Xiaodong He (162 papers)
  5. Bowen Zhou (141 papers)
Citations (82)

Summary

We haven't generated a summary for this paper yet.