Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A framework of text-dependent speaker verification for chinese numerical string corpus (2405.07029v2)

Published 11 May 2024 in cs.SD and eess.AS

Abstract: The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impacted by reading rhythms and pauses. To address this problem, we propose an end-to-end speaker verification system that enhances TD-SV by decoupling speaker and text information. Our system consists of a text embedding extractor, a speaker embedding extractor and a fusion module. In the text embedding extractor, we employ an enhanced Transformer and introduce a triple loss including text classification loss, connectionist temporal classification (CTC) loss and decoder loss; while in the speaker embedding extractor, we create a multi-scale pooling method by combining sliding window attentive statistics pooling (SWASP) with attentive statistics pooling (ASP). To mitigate the scarcity of data, we have recorded a publicly available Chinese numerical corpus named SHALCAS22A (hereinafter called SHAL), which can be accessed on Open-SLR. Moreover, we employ data augmentation techniques using Tacotron2 and HiFi-GAN. Our method achieves an equal error rate (EER) performance improvement of 49.2% on Hi-Mia and 75.0% on SHAL, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Litong Zheng (2 papers)
  2. Feng Hong (18 papers)
  3. Weijie Xu (28 papers)
  4. Wan Zheng (1 paper)

Summary

We haven't generated a summary for this paper yet.