Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Centroid-based deep metric learning for speaker recognition (1902.02375v1)

Published 6 Feb 2019 in cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jixuan Wang (12 papers)
  2. Kuan-Chieh Wang (30 papers)
  3. Marc Law (3 papers)
  4. Frank Rudzicz (90 papers)
  5. Michael Brudno (8 papers)
Citations (100)

Summary

We haven't generated a summary for this paper yet.