Data Augmentation for End-to-end Code-switching Speech Recognition (2011.02160v2)

Published 4 Nov 2020 in cs.CL and eess.AS

Abstract: Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data augmentation. Specifically, they are audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion. Our experiments on 200 hours Mandarin-English code-switching dataset show that all the three proposed approaches yield significant improvements on code-switching ASR individually. Moreover, all the proposed approaches can be combined with recent popular SpecAugment, and an addition gain can be obtained. WER is significantly reduced by relative 24.0% compared to the system without any data augmentation, and still relative 13.0% gain compared to the system with only SpecAugment

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (5)

Chenpeng Du (28 papers)
Hao Li (803 papers)
Yizhou Lu (29 papers)
Lan Wang (113 papers)
Yanmin Qian (96 papers)

Citations (25)

View on Semantic Scholar

Data Augmentation for End-to-end Code-switching Speech Recognition (2011.02160v2)

Related Papers