Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond (2310.05513v1)
Abstract: The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.
- “Self-supervised speech representation learning: A review,” JSTSP, 2022.
- “SUPERB: Speech Processing Universal PERformance Benchmark,” in Proc. Interspeech, 2021, pp. 1194–1198.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Proc. NeurIPS, vol. 33, pp. 12449–12460, 2020.
- “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” TASLP, vol. 29, pp. 3451–3460, 2021.
- “XLS-R: Self-supervised cross-lingual speech representation learning at scale,” arXiv preprint arXiv:2111.09296, 2021.
- “Unsupervised cross-lingual representation learning for speech recognition,” Proc. Interspeech, 2020.
- “Speechmatrix: A large-scale mined corpus of multilingual speech-to-speech translations,” arXiv preprint arXiv:2211.04508, 2022.
- “Improving automatic speech recognition performance for low-resource languages with self-supervised models,” JSTSP, vol. 16, no. 6, pp. 1227–1241, 2022.
- “Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation,” in Proc. Interspeech, 2022, pp. 3533–3537.
- “Self-supervised representations improve end-to-end speech translation,” Proc. Interspeech, pp. 1491–1495, 2020.
- “ASR2K: Speech recognition for around 2000 languages without audio,” Proc. Interspeech, pp. 4885–4889, 2022.
- “ML-SUPERB: Multilingual speech universal performance benchmark,” Proc. Interspeech, 2023.
- “Highland puebla nahuatl speech translation corpus for endangered language documentation,” in Proc. AmericaNLP, 2021, pp. 53–63.
- “Leveraging end-to-end asr for endangered language documentation: An empirical study on Yolóxochitl Mixtec,” in Proc. EACL, 2021, pp. 1134–1145.
- “SUPERB@ SLT 2022: Challenge on generalization and efficiency of self-supervised speech representation learning,” in Proc. SLT, 2023, pp. 1096–1103.
- “The language-independent bottleneck features,” in Proc. SLT, 2012, pp. 336–341.
- “Multilingual bottle-neck features and its application for under-resourced languages,” in Spoken language technologies for under-resourced languages, 2012.
- “Multilingual representations for low resource speech recognition and keyword search,” in Proc. ASRU, 2015, pp. 259–266.
- “Network architectures for multilingual speech representation learning,” in Proc. ICASSP, 2017, pp. 5295–5299.
- “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning,” in Proc. Interspeech, 2020, pp. 1037–1041.
- “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters,” in Proc. Interspeech, 2020, pp. 4751–4755.
- “Scaling end-to-end models for large-scale multilingual asr,” in Proc. ASRU, 2021, pp. 1011–1018.
- “Learning robust and multilingual speech representations,” in Findings of EMNLP, 2020, pp. 1182–1192.
- “Improving massively multilingual ASR with auxiliary CTC objectives,” Proc. Interspeech, 2023.
- “Jhu iwslt 2022 dialect speech translation system description,” in Proc. IWSLT, 2022, pp. 319–326.
- “ESPnet-SLU: Advancing spoken language understanding through espnet,” in Proc. ICASSP, 2022, pp. 7167–7171.
- “Improved language identification through cross-lingual self-supervised learning,” in Proc. ICASSP, 2022, pp. 6877–6881.
- “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” JSTSP, vol. 16, no. 6, pp. 1505–1518, 2022.
- “ LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech,” in Proc. Interspeech, 2021, pp. 1439–1443.
- “Indicsuperb: A speech processing universal performance benchmark for indian languages,” Proc. AAAI, 2022.
- “XTREME-S: Evaluating Cross-lingual Speech Representations,” in Proc. Interspeech, 2022, pp. 3248–3252.
- “Evaluating self-supervised speech models on a taiwanese hokkien corpus,” in Proc. ASRU, 2023.
- “Leveraging the multilingual indonesian ethnic languages dataset in self-supervised model for low-resource asr task,” in Proc. ASRU, 2023.
- “Ìròyìnspeech: A multi-purpose yorùbá speech corpus,” arXiv preprint arXiv:2307.16071, 2023.
- “Evaluating self-supervised speech representations for indigenous American languages,” arXiv preprint arXiv:2310.03639, 2023.
- “Thai dialect corpus and transfer-based curriculum learning investigation for dialect automatic speech recognition,” in Proc. Interspeech, 2023.
- “Towards language preservation: Design and collection of graphemically balanced and parallel speech corpora of indonesian ethnic languages,” in Proc. O-COCOSDA/CASLRE, 2013, pp. 1–5.
- “NusaCrowd: Open source initiative for Indonesian NLP resources,” in Findings of ACL, 2023, pp. 13745–13818.
- “Siminchik: A speech corpus for preservation of southern quechua,” ISI-NLP 2, p. 21, 2018.
- “BABEL: An eastern european multi-language database,” in Proc. ICSLP, 1996, vol. 3, pp. 1892–1893.
- “The Fisher corpus: A resource for the next generations of speech-to-text.,” in LREC, 2004, vol. 4, pp. 69–71.
- “SWITCHBOARD: Telephone speech corpus for research and development,” in Proc. ICASSP, 1992, vol. 1, pp. 517–520.
- “Ksponspeech: Korean spontaneous speech corpus for automatic speech recognition,” Applied Sciences, vol. 10, no. 19, pp. 6936, 2020.
- “Muskits: an end-to-end music processing toolkit for singing voice synthesis,” in Proc. Interspeech, 2022, pp. 4277–4281.
- “Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis,” in Proc. Interspeech, 2022, pp. 4242–4246.
- “Multi-singer: Fast multi-singer singing voice vocoder with a large-scale corpus,” in Proc. ACMMM, 2021, pp. 3945–3954.
- “M4singer: A multi-style, multi-singer and musical score provided mandarin singing corpus,” Proc. NIPS, vol. 35, pp. 6914–6926, 2022.
- “Children’s song dataset for singing voice research,” in Proc. ISMIR, 2020.
- “Voxpopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation,” in Proc. ACL, 2021, pp. 993–1003.
- “Textless speech-to-speech translation on real data,” in Proc. NAACL, 2022, pp. 860–872.
- “Exploration on hubert with multiple resolutions,” Proc. Interspeech, 2023.
- “Joint prediction and denoising for large-scale multilingual self-supervised learning,” arXiv preprint arXiv:2309.15317, 2023.
- “Common voice: A massively-multilingual speech corpus,” in Proc. LREC, 2020, pp. 4218–4222.
- “Reducing barriers to self-supervised learning: Hubert pre-training with academic compute,” Proc. Interspeech, 2023.
- “FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition,” in Proc. Interspeech, 2022, pp. 3058–3062.
- “EFFUSE: Efficient self-supervised feature fusion for E2E ASR in multilingual and low resource scenarios,” arXiv preprint arXiv:2310.03938, 2023.
- “SSHR: Leveraging self-supervised hierarchical representations for multilingual automatic speech recognition,” arXiv preprint arXiv:2309.16937, 2023.
- “Scaling speech technology to 1,000+ languages,” arXiv preprint arXiv:2305.13516, 2023.
- Jiatong Shi (82 papers)
- William Chen (49 papers)
- Dan Berrebbi (10 papers)
- Hsiu-Hsuan Wang (7 papers)
- Wei-Ping Huang (16 papers)
- En-Pei Hu (5 papers)
- Ho-Lam Chuang (1 paper)
- Xuankai Chang (61 papers)
- Yuxun Tang (13 papers)
- Shang-Wen Li (55 papers)
- Abdelrahman Mohamed (59 papers)
- Hung-yi Lee (327 papers)
- Shinji Watanabe (416 papers)