Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning (2309.15317v2)

Published 26 Sep 2023 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA) methods due to the expenses and complexity required to handle many languages. This further harms the reproducibility of SSL, which is already limited to few research groups due to its resource usage. We show that more powerful techniques can actually lead to more efficient pre-training, opening SSL to more research groups. We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages. To build WavLabLM, we devise a novel multi-stage pre-training method, designed to address the language imbalance of multilingual data. WavLabLM achieves comparable performance to XLS-R on ML-SUPERB with less than 10% of the training data, making SSL realizable with academic compute. We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data, 4 GPUs, and limited trials. We open-source all code and models in ESPnet.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. William Chen (49 papers)
  2. Jiatong Shi (82 papers)
  3. Brian Yan (40 papers)
  4. Dan Berrebbi (10 papers)
  5. Wangyou Zhang (35 papers)
  6. Yifan Peng (147 papers)
  7. Xuankai Chang (61 papers)
  8. Soumi Maiti (26 papers)
  9. Shinji Watanabe (416 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.