Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages (2310.03018v3)
Abstract: We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of LLMing on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance. Notably, though our results demonstrate that speech encoders with multilingual pre-training, exemplified by XLSR, outperform monolingual variants (Wav2vec 2.0, HuBERT) in code-switching scenarios, there is still substantial room for improvement in their code-switching linguistic abilities.
- Hexin Liu et al., “Reducing language confusion for code-switching speech recognition with token-level language diarization,” in ICASSP 2023.
- “Benchmarking evaluation metrics for code-switching automatic speech recognition,” in SLT. IEEE, 2022, pp. 999–1005.
- “End-to-end speech translation for code switched speech,” in Findings of ACL, Dublin, Ireland, May 2022, pp. 1435–1448, Association for Computational Linguistics.
- “Code-switching without switching: Language agnostic end-to-end speech translation,” arXiv:2210.01512, 2022.
- “Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion,” in INTERSPEECH 2020.
- “wav2vec 2.0: a framework for self-supervised learning of speech representations,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, pp. 12449–12460.
- Wei-Ning Hsu et al., “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” TASLP, vol. 29, pp. 3451–3460, 2021.
- “Textless speech-to-speech translation on real data,” in NAACL, 2022, pp. 860–872.
- “Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation,” in INTERSPEECH 2022.
- “Ensemble knowledge distillation of self-supervised speech models,” in ICASSP 2023.
- “Improving generalizability of distilled self-supervised speech processing models under distorted settings,” in SLT 2022.
- Alexis Conneau et al., “Unsupervised Cross-Lingual Representation Learning for Speech Recognition,” in INTERSPEECH 2021, 2021.
- Arun Babu et al., “XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale,” in Proc. Interspeech 2022, 2022, pp. 2278–2282.
- Zheng-Xin Yong et al., “Prompting large language models to generate code-mixed texts: The case of south east asian languages,” arXiv:2303.13592, 2023.
- Ruochen Zhang et al., “Multilingual large language models are not (yet) code-switchers,” arXiv:2305.14235, 2023.
- Genta Indra Winata et al., “Are multilingual models effective in code-switching?,” in Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, 2021, pp. 142–153.
- “BLiMP: The benchmark of linguistic minimal pairs for English,” TACL, vol. 8, pp. 377–392, 2020.
- Tu Anh Nguyen et al., “The zero resource speech benchmark 2021: Metrics and baselines for unsupervised spoken language modeling,” in NeuRIPS Workshop on Self-Supervised Learning for Speech and Audio Processing, 2020.
- “A formal grammar for code-switching,” Research on Language & Social Interaction, vol. 14, no. 1, pp. 3–45, 1981.
- Carol Myers-Scotton, “Code-switching,” The handbook of sociolinguistics, pp. 217–237, 2017.
- Rosana Ardila et al., “Common voice: A massively-multilingual speech corpus,” in Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, May 2020, pp. 4218–4222, European Language Resources Association.
- “Amazon Polly,” https://aws.amazon.com/polly/.
- “Direct speech-to-speech translation with discrete units,” in ACL, Dublin, Ireland, May 2022, pp. 3327–3339, ACL.
- “fairseq: A fast, extensible toolkit for sequence modeling,” in NAACL (Demonstrations), Minneapolis, Minnesota, June 2019, pp. 48–53, Association for Computational Linguistics.
- Alexis Conneau et al., “Unsupervised cross-lingual representation learning at scale,” in ACL 2020. ACL.
- Xi Victoria Lin et al., “Few-shot learning with multilingual generative language models,” in EMNLP, Abu Dhabi, United Arab Emirates, Dec. 2022, pp. 9019–9052, Association for Computational Linguistics.
- “Librispeech: an asr corpus based on public domain audio books,” in ICASSP. IEEE, 2015, pp. 5206–5210.
- Vineel Pratap et al., “MLS: A Large-Scale Multilingual Dataset for Speech Research,” in INTERSPEECH 2020.
- “MAGICDATA,” https://www.openslr.org/68.
- “S3PRL,” https://github.com/s3prl/s3prl.
- “vq-wav2vec: Self-supervised learning of discrete speech representations,” in ICLR. 2020, OpenReview.net.
- Jiatong Shi et al., “ML-SUPERB: Multilingual Speech Universal PERformance Benchmark,” in INTERSPEECH 2023.
- “The Zero Resource Speech Challenge 2021: Spoken Language Modelling,” in INTERSPEECH 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.