Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition (2305.11569v1)

Published 19 May 2023 in eess.AS, cs.CL, and cs.SD

Abstract: We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Multilingual Speech (MLS) Corpus show that the proposed approach consistently outperforms the standard HuBERT on all the target languages. Moreover, on 3 of the 4 languages, comparing to the standard HuBERT, the approach performs better, meanwhile is able to save supervised training data by 1.5k hours (75%) at most. Our approach outperforms most of the state of the arts, with much less pretraining data in terms of hours and language diversity. Compared to XLSR-53 and a retraining based multilingual method, our approach performs better with full and limited finetuning data scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Siyuan Feng (55 papers)
  2. Ming Tu (20 papers)
  3. Rui Xia (53 papers)
  4. Chuanzeng Huang (10 papers)
  5. Yuxuan Wang (239 papers)
Citations (3)