Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? (2204.12765v2)

Published 27 Apr 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (11)

Sanyuan Chen (28 papers)
Yu Wu (196 papers)
Chengyi Wang (32 papers)
Shujie Liu (101 papers)
Zhuo Chen (319 papers)
Peidong Wang (33 papers)
Gang Liu (177 papers)
Jinyu Li (164 papers)
Jian Wu (314 papers)
Xiangzhan Yu (7 papers)
Furu Wei (291 papers)

Citations (38)

View on Semantic Scholar

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? (2204.12765v2)

Related Papers