Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset (2107.05233v1)

Published 12 Jul 2021 in eess.AS

Abstract: Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset. The common wisdom is that SSL helps resource-limited tasks in which only a limited amount of labeled data is available. The benefit of SSL keeps diminishing when the labeled training data amount increases. To our best knowledge, at most a few thousand hours of labeled data was used in the study of SSL. In contrast, the industry usually uses tens of thousands of hours of labeled data to build high-accuracy speech recognition (ASR) systems for resource-rich languages. In this study, we take the challenge to investigate whether and how SSL can improve the ASR accuracy of a state-of-the-art production-scale Transformer-Transducer model, which was built with 65 thousand hours of anonymized labeled EN-US data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chengyi Wang (32 papers)
  2. Yu Wu (196 papers)
  3. Shujie Liu (101 papers)
  4. Jinyu Li (164 papers)
  5. Yao Qian (37 papers)
  6. Kenichi Kumatani (15 papers)
  7. Furu Wei (291 papers)
Citations (12)