Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark (2305.10615v2)

Published 18 May 2023 in cs.SD, cs.CL, and eess.AS

Abstract: Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. ``Self-supervised speech representation learning: A review'' In JSTSP, 2022
  2. ``SUPERB: Speech Processing Universal PERformance Benchmark'' In Proc. Interspeech, 2021, pp. 1194–1198 DOI: 10.21437/Interspeech.2021-1775
  3. ``wav2vec 2.0: A framework for self-supervised learning of speech representations'' In Proc. NeurIPS 33, 2020, pp. 12449–12460
  4. ``HuBERT: Self-supervised speech representation learning by masked prediction of hidden units'' In TASLP 29, 2021, pp. 3451–3460
  5. ``SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities'' In Proc. ACL, 2022, pp. 8479–8492
  6. ``XLS-R: Self-supervised cross-lingual speech representation learning at scale'' In arXiv preprint arXiv:2111.09296, 2021
  7. ``Unsupervised cross-lingual representation learning for speech recognition'' In arXiv preprint arXiv:2006.13979, 2020
  8. ``SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations'' In arXiv preprint arXiv:2211.04508, 2022
  9. ``Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models'' In JSTSP 16.6, 2022, pp. 1227–1241
  10. Dan Berrebbi, Jiatong Shi and Brian Yan ``Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation'' In Proc. Interspeech, 2022, pp. 3533–3537 DOI: 10.21437/Interspeech.2022-10796
  11. ``Self-Supervised Representations Improve End-to-End Speech Translation'' In Proc. Interspeech 2020, 2020, pp. 1491–1495
  12. ``ASR2K: Speech Recognition for Around 2000 Languages without Audio'' In Proc. Interspeech, 2022, pp. 4885–4889 DOI: 10.21437/Interspeech.2022-10712
  13. `` LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech'' In Proc. Interspeech, 2021, pp. 1439–1443 DOI: 10.21437/Interspeech.2021-556
  14. ``IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages'' In arXiv preprint arXiv:2208.11761, 2022
  15. ``XTREME-S: Evaluating Cross-lingual Speech Representations'' In Proc. Interspeech, 2022, pp. 3248–3252 DOI: 10.21437/Interspeech.2022-10007
  16. ``MLS: A Large-Scale Multilingual Dataset for Speech Research'' In Proc. Interspeech 2020, 2020, pp. 2757–2761
  17. ``Common Voice: A Massively-Multilingual Speech Corpus'' In Proc. LREC, 2020, pp. 4218–4222
  18. Ken MacLean ``Voxforge'' In Ken MacLean.[Online]. Available: http://www. voxforge. org/home.[Accessed by 2022], 2018
  19. ``VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation'' In Proc. ACL, 2021, pp. 993–1003
  20. ``A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese'' In Proc. SLTU, 2018, pp. 66–70
  21. ``Open-Source High Quality Speech Datasets for Basque, Catalan and Galician'' In Proc. SLTU, 2020, pp. 21–27
  22. ``Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems'' In Proc. LREC, 2020, pp. 6494–6503
  23. ``Language Technology Support for Norwegian'' In The Norwegian Language in the Digital Age: Bokmalsversjon, 2012, pp. 52–70
  24. ``Fleurs: Few-shot learning evaluation of universal representations of speech'' In Proc. SLT, 2023, pp. 798–805
  25. ``The NCHLT speech corpus of the South African languages'', 2014
  26. Timo Baumann, Arne Köhn and Felix Hennig ``The Spoken Wikipedia Corpus collection: Harvesting, alignment and an application to hyperlistening'' In LREC 53, 2019, pp. 303–329
  27. Jiatong Shi ``Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec'' In Proc. ACL, 2021, pp. 1134–1145
  28. ``Highland Puebla Nahuatl speech translation corpus for endangered language documentation'' In Proc. AmericaNLP, 2021, pp. 53–63
  29. Imdat Solak ``M-AILab Speech dataset'' In Imdat Solak.[Online]. Available: https://www.caito.de/2019/01/03/the-m-ailabs-speech-dataset/.[Accessed by 2022], 2018
  30. ``All Together Now: The Living Audio Dataset.'' In INTERSPEECH, 2019, pp. 1521–1525
  31. ``A smartphone-based ASR data collection tool for under-resourced languages'' In Speech communication 56, 2014, pp. 119–131
  32. Shinji Watanabe, Takaaki Hori and John R Hershey ``Language independent end-to-end architecture for joint language identification and speech recognition'' In Proc. ASRU, 2017, pp. 265–271
  33. ``Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning'' In Proc. Interspeech, 2020, pp. 1037–1041 DOI: 10.21437/Interspeech.2020-2164
  34. ``Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification'' In Proc. Interspeech, 2022, pp. 3223–3227 DOI: 10.21437/Interspeech.2022-11249
  35. ``Improving Massively Multilingual ASR With Auxiliary CTC Objectives'' In Proc. ICASSP 2023, 2023
  36. ``Transformers: State-of-the-Art Natural Language Processing'' In Proc. EMNLP, 2020, pp. 38–45 DOI: 10.18653/v1/2020.emnlp-demos.6
  37. ``ESPnet: End-to-End Speech Processing Toolkit'' In Proc. Interspeech, 2018, pp. 2207–2211 DOI: 10.21437/Interspeech.2018-1456
  38. ``SUPERB@ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning'' In Proc. SLT, 2023, pp. 1096–1103
  39. ``An exploration of self-supervised pretrained representations for end-to-end speech recognition'' In Proc. ASRU, 2021, pp. 228–235
  40. ``WavLM: Large-scale self-supervised pre-training for full stack speech processing'' In JSTSP 16.6, 2022, pp. 1505–1518
  41. ``Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training'' In Proc. Interspeech, 2021, pp. 721–725 DOI: 10.21437/Interspeech.2021-236
  42. ``Chinese speech pretraining'', 2023 URL: https://github.com/TencentGameMate/chinese_speech_pretrain
  43. ``Textless Speech-to-Speech Translation on Real Data'' In Proc. NAACL, 2022, pp. 860–872
  44. Ankita Pasad, Bowen Shi and Karen Livescu ``Comparative layer-wise analysis of self-supervised speech models'' In Proc. ICASSP 2023, 2022
  45. Dan Berrebbi, Brian Yan and Shinji Watanabe ``Avoid Overthinking in Self-Supervised Models for Speech Recognition'' In Proc. ICASSP 2023, 2022
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Jiatong Shi (82 papers)
  2. Dan Berrebbi (10 papers)
  3. William Chen (49 papers)
  4. Ho-Lam Chung (13 papers)
  5. En-Pei Hu (5 papers)
  6. Wei Ping Huang (1 paper)
  7. Xuankai Chang (61 papers)
  8. Shang-Wen Li (55 papers)
  9. Abdelrahman Mohamed (59 papers)
  10. Hung-yi Lee (327 papers)
  11. Shinji Watanabe (416 papers)
Citations (50)

Summary

We haven't generated a summary for this paper yet.