Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond (2310.05513v1)

Published 9 Oct 2023 in cs.SD, cs.CL, and eess.AS

Abstract: The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. “Self-supervised speech representation learning: A review,” JSTSP, 2022.
  2. “SUPERB: Speech Processing Universal PERformance Benchmark,” in Proc. Interspeech, 2021, pp. 1194–1198.
  3. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Proc. NeurIPS, vol. 33, pp. 12449–12460, 2020.
  4. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” TASLP, vol. 29, pp. 3451–3460, 2021.
  5. “XLS-R: Self-supervised cross-lingual speech representation learning at scale,” arXiv preprint arXiv:2111.09296, 2021.
  6. “Unsupervised cross-lingual representation learning for speech recognition,” Proc. Interspeech, 2020.
  7. “Speechmatrix: A large-scale mined corpus of multilingual speech-to-speech translations,” arXiv preprint arXiv:2211.04508, 2022.
  8. “Improving automatic speech recognition performance for low-resource languages with self-supervised models,” JSTSP, vol. 16, no. 6, pp. 1227–1241, 2022.
  9. “Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation,” in Proc. Interspeech, 2022, pp. 3533–3537.
  10. “Self-supervised representations improve end-to-end speech translation,” Proc. Interspeech, pp. 1491–1495, 2020.
  11. “ASR2K: Speech recognition for around 2000 languages without audio,” Proc. Interspeech, pp. 4885–4889, 2022.
  12. “ML-SUPERB: Multilingual speech universal performance benchmark,” Proc. Interspeech, 2023.
  13. “Highland puebla nahuatl speech translation corpus for endangered language documentation,” in Proc. AmericaNLP, 2021, pp. 53–63.
  14. “Leveraging end-to-end asr for endangered language documentation: An empirical study on Yolóxochitl Mixtec,” in Proc. EACL, 2021, pp. 1134–1145.
  15. “SUPERB@ SLT 2022: Challenge on generalization and efficiency of self-supervised speech representation learning,” in Proc. SLT, 2023, pp. 1096–1103.
  16. “The language-independent bottleneck features,” in Proc. SLT, 2012, pp. 336–341.
  17. “Multilingual bottle-neck features and its application for under-resourced languages,” in Spoken language technologies for under-resourced languages, 2012.
  18. “Multilingual representations for low resource speech recognition and keyword search,” in Proc. ASRU, 2015, pp. 259–266.
  19. “Network architectures for multilingual speech representation learning,” in Proc. ICASSP, 2017, pp. 5295–5299.
  20. “Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning,” in Proc. Interspeech, 2020, pp. 1037–1041.
  21. “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters,” in Proc. Interspeech, 2020, pp. 4751–4755.
  22. “Scaling end-to-end models for large-scale multilingual asr,” in Proc. ASRU, 2021, pp. 1011–1018.
  23. “Learning robust and multilingual speech representations,” in Findings of EMNLP, 2020, pp. 1182–1192.
  24. “Improving massively multilingual ASR with auxiliary CTC objectives,” Proc. Interspeech, 2023.
  25. “Jhu iwslt 2022 dialect speech translation system description,” in Proc. IWSLT, 2022, pp. 319–326.
  26. “ESPnet-SLU: Advancing spoken language understanding through espnet,” in Proc. ICASSP, 2022, pp. 7167–7171.
  27. “Improved language identification through cross-lingual self-supervised learning,” in Proc. ICASSP, 2022, pp. 6877–6881.
  28. “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” JSTSP, vol. 16, no. 6, pp. 1505–1518, 2022.
  29. “ LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech,” in Proc. Interspeech, 2021, pp. 1439–1443.
  30. “Indicsuperb: A speech processing universal performance benchmark for indian languages,” Proc. AAAI, 2022.
  31. “XTREME-S: Evaluating Cross-lingual Speech Representations,” in Proc. Interspeech, 2022, pp. 3248–3252.
  32. “Evaluating self-supervised speech models on a taiwanese hokkien corpus,” in Proc. ASRU, 2023.
  33. “Leveraging the multilingual indonesian ethnic languages dataset in self-supervised model for low-resource asr task,” in Proc. ASRU, 2023.
  34. “Ìròyìnspeech: A multi-purpose yorùbá speech corpus,” arXiv preprint arXiv:2307.16071, 2023.
  35. “Evaluating self-supervised speech representations for indigenous American languages,” arXiv preprint arXiv:2310.03639, 2023.
  36. “Thai dialect corpus and transfer-based curriculum learning investigation for dialect automatic speech recognition,” in Proc. Interspeech, 2023.
  37. “Towards language preservation: Design and collection of graphemically balanced and parallel speech corpora of indonesian ethnic languages,” in Proc. O-COCOSDA/CASLRE, 2013, pp. 1–5.
  38. “NusaCrowd: Open source initiative for Indonesian NLP resources,” in Findings of ACL, 2023, pp. 13745–13818.
  39. “Siminchik: A speech corpus for preservation of southern quechua,” ISI-NLP 2, p. 21, 2018.
  40. “BABEL: An eastern european multi-language database,” in Proc. ICSLP, 1996, vol. 3, pp. 1892–1893.
  41. “The Fisher corpus: A resource for the next generations of speech-to-text.,” in LREC, 2004, vol. 4, pp. 69–71.
  42. “SWITCHBOARD: Telephone speech corpus for research and development,” in Proc. ICASSP, 1992, vol. 1, pp. 517–520.
  43. “Ksponspeech: Korean spontaneous speech corpus for automatic speech recognition,” Applied Sciences, vol. 10, no. 19, pp. 6936, 2020.
  44. “Muskits: an end-to-end music processing toolkit for singing voice synthesis,” in Proc. Interspeech, 2022, pp. 4277–4281.
  45. “Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis,” in Proc. Interspeech, 2022, pp. 4242–4246.
  46. “Multi-singer: Fast multi-singer singing voice vocoder with a large-scale corpus,” in Proc. ACMMM, 2021, pp. 3945–3954.
  47. “M4singer: A multi-style, multi-singer and musical score provided mandarin singing corpus,” Proc. NIPS, vol. 35, pp. 6914–6926, 2022.
  48. “Children’s song dataset for singing voice research,” in Proc. ISMIR, 2020.
  49. “Voxpopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation,” in Proc. ACL, 2021, pp. 993–1003.
  50. “Textless speech-to-speech translation on real data,” in Proc. NAACL, 2022, pp. 860–872.
  51. “Exploration on hubert with multiple resolutions,” Proc. Interspeech, 2023.
  52. “Joint prediction and denoising for large-scale multilingual self-supervised learning,” arXiv preprint arXiv:2309.15317, 2023.
  53. “Common voice: A massively-multilingual speech corpus,” in Proc. LREC, 2020, pp. 4218–4222.
  54. “Reducing barriers to self-supervised learning: Hubert pre-training with academic compute,” Proc. Interspeech, 2023.
  55. “FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition,” in Proc. Interspeech, 2022, pp. 3058–3062.
  56. “EFFUSE: Efficient self-supervised feature fusion for E2E ASR in multilingual and low resource scenarios,” arXiv preprint arXiv:2310.03938, 2023.
  57. “SSHR: Leveraging self-supervised hierarchical representations for multilingual automatic speech recognition,” arXiv preprint arXiv:2309.16937, 2023.
  58. “Scaling speech technology to 1,000+ languages,” arXiv preprint arXiv:2305.13516, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Jiatong Shi (82 papers)
  2. William Chen (49 papers)
  3. Dan Berrebbi (10 papers)
  4. Hsiu-Hsuan Wang (7 papers)
  5. Wei-Ping Huang (16 papers)
  6. En-Pei Hu (5 papers)
  7. Ho-Lam Chuang (1 paper)
  8. Xuankai Chang (61 papers)
  9. Yuxun Tang (13 papers)
  10. Shang-Wen Li (55 papers)
  11. Abdelrahman Mohamed (59 papers)
  12. Hung-yi Lee (327 papers)
  13. Shinji Watanabe (416 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.