Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information (2212.03476v1)

Published 7 Dec 2022 in eess.AS, cs.CL, and cs.SD

Abstract: Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.

Authors (5)

Fenglin Ding (5 papers)
Genshun Wan (10 papers)
Pengcheng Li (60 papers)
Jia Pan (127 papers)
Cong Liu (169 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information (2212.03476v1)

Summary

Related Papers