Is 2B parameters the effective scaling limit for self-supervised speech encoders?
Determine whether a parameter scale of approximately 2 billion constitutes an effective upper limit for scaling self-supervised speech representation learning models (such as wav2vec 2.0) for speech tasks, by ascertaining if additional capacity beyond 2B yields diminishing returns or if 2B parameters are already sufficient for most downstream speech applications.
References
Yet it remains an open question whether 2B parameters marks the effective limit of scaling, either because additional capacity yields diminishing returns, or because 2B parameters are already sufficient for solving most speech tasks.
— Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
(2511.09690 - team et al., 12 Nov 2025) in Subsubsection "Scaling Speech SSL Beyond 2B" within Section 5.1 (Massively Cross-Lingual Self-Supervised Representations)