Unsupervised Data Selection via Discrete Speech Representation for ASR (2204.01981v1)

Published 5 Apr 2022 in eess.AS

Abstract: Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain. It takes the discrete speech representation available in common self-supervised learning frameworks as input, and applies a contrastive data selection method on the discrete tokens. Through extensive empirical studies we show that our proposed method reduces the amount of required pre-training data and improves the downstream ASR performance. Pre-training on a selected subset of 6% of the general data pool results in 11.8% relative improvements in LibriSpeech test-other compared to pre-training on the full set. On Multilingual LibriSpeech French, German, and Spanish test sets, selecting 6% data for pre-training reduces word error rate by more than 15% relatively compared to the full set, and achieves competitive results compared to current state-of-the-art performances.

Authors (6)

Zhiyun Lu (19 papers)
Yongqiang Wang (92 papers)
Yu Zhang (1400 papers)
Wei Han (202 papers)
Zhehuai Chen (39 papers)
Parisa Haghani (15 papers)

Citations (11)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Unsupervised Data Selection via Discrete Speech Representation for ASR (2204.01981v1)

Summary

Related Papers