Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching (2311.15077v1)
Abstract: While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do LLM rescoring. Here we propose finetuning self-supervised speech representations such as wav2vec 2.0 XLSR to recognize code-switched data. We find that finetuning self-supervised multilingual representations and augmenting them with n-gram LLMs trained from transcripts reduces absolute word error rates by up to 20% compared to baselines of hybrid models trained from scratch on code-switched data. Our findings suggest that in circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
- Basem H.A. Ahmed and Tien-Ping Tan. 2012. Automatic speech recognition of code switching speech using 1-best rescoring. In 2012 International Conference on Asian Language Processing, pages 137–140.
- Language therapy and bilingual aphasia: Clinical implications of psycholinguistic and neuroimaging research. Journal of Neurolinguistics, 21(6):539–557. Acquisition, Processing and Loss of L2: Functional, cognitive and neural perspectives.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems, volume 33, pages 12449–12460. Curran Associates, Inc.
- The NCHLT speech corpus of the South African languages. In 4th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014), pages 194–200.
- Code-switched automatic speech recognition in five south african languages. Computer Speech & Language, 71:101262.
- Curriculum design for code-switching: Experiments with language identification and language modeling with deep neural networks. In Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), pages 65–74.
- Unsupervised cross-lingual representation learning for speech recognition.
- Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 187–197, Edinburgh, Scotland. Association for Computational Linguistics.
- Towards end-to-end automatic code-switching speech recognition. arXiv e-prints, pages arXiv–1810.
- Towards code-switching asr for end-to-end ctc models. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6076–6080.
- Towards end-to-end code-switching speech recognition. arXiv preprint arXiv:1810.13091.
- Speech recognition on code-switching among the chinese dialects. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, volume 1, pages I–I.
- Carol Myers-Scotton. 2017. Code-switching. The handbook of sociolinguistics, pages 217–237.
- Thomas Niesler et al. 2018. A first south african corpus of multilingual code-switched soap opera speech. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
- The kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding, CONF. IEEE Signal Processing Society.
- Automatic detection of code-switching style from acoustics. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 76–81, Melbourne, Australia. Association for Computational Linguistics.
- Language-specific characteristic assistance for code-switching speech recognition. Interspeech 2022.
- Code-switching detection using asr-generated language posteriors. Proc. Interspeech 2019, pages 3740–3744.
- Towards end-to-end automatic code-switching speech recognition. arXiv preprint arXiv:1810.12620.
- Code-switching detection using multilingual dnns. In 2016 IEEE Spoken Language Technology Workshop (SLT), pages 610–616. IEEE.
- On the end-to-end solution to mandarin-english code-switching speech recognition. Interspeech 2019.
- Reducing multilingual context confusion for end-to-end codeswitching automatic speech recognition. In Proc. Interspeech, pages 3894–3898.