Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation (2210.12214v1)
Abstract: Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech. We analyze how each of the neural transducer's encoders contributes towards code-switching performance by measuring encoder-specific recall values, and evaluate our English/Mandarin system on the ASCEND data set. Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set -- reducing the MER by 2.1% absolute compared to the previous literature -- while maintaining good accuracy on the monolingual test sets.
- Thien Nguyen (25 papers)
- Nathalie Tran (1 paper)
- Liuhui Deng (4 papers)
- Thiago Fraga da Silva (2 papers)
- Matthew Radzihovsky (3 papers)
- Roger Hsiao (10 papers)
- Henry Mason (7 papers)
- Stefan Braun (11 papers)
- Erik McDermott (9 papers)
- Dogan Can (4 papers)
- Pawel Swietojanski (11 papers)
- Lyan Verwimp (11 papers)
- Sibel Oyman (1 paper)
- Tresi Arvizo (2 papers)
- Honza Silovsky (2 papers)
- Arnab Ghoshal (5 papers)
- Mathieu Martel (1 paper)
- Bharat Ram Ambati (1 paper)
- Mohamed Ali (4 papers)