Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? (2204.12765v2)

Published 27 Apr 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Sanyuan Chen (28 papers)
  2. Yu Wu (196 papers)
  3. Chengyi Wang (32 papers)
  4. Shujie Liu (101 papers)
  5. Zhuo Chen (319 papers)
  6. Peidong Wang (33 papers)
  7. Gang Liu (177 papers)
  8. Jinyu Li (164 papers)
  9. Jian Wu (314 papers)
  10. Xiangzhan Yu (7 papers)
  11. Furu Wei (291 papers)
Citations (38)

Summary

We haven't generated a summary for this paper yet.