Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages (2107.07402v2)

Published 15 Jul 2021 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise loss during pretraining to compare effects of monolingual and multilingual pretraining. Performance on some downstream fine-tuning tasks for speech recognition is also compared and our experiments show that multilingual pretraining outperforms monolingual training, in terms of learning speech representations which encodes phonetic similarity of languages and also in terms of performance on down stream tasks. A decrease of 5% is observed in WER and 9.5% in CER when a multilingual pretrained model is used for finetuning in Hindi. All the code models are also open sourced. CLSRIL-23 is a model trained on $23$ languages and almost 10,000 hours of audio data to facilitate research in speech recognition for Indic languages. We hope that new state of the art systems will be created using the self supervised approach, especially for low resources Indic languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Anirudh Gupta (9 papers)
  2. Harveen Singh Chadha (10 papers)
  3. Priyanshi Shah (10 papers)
  4. Neeraj Chhimwal (8 papers)
  5. Ankur Dhuriya (8 papers)
  6. Rishabh Gaur (7 papers)
  7. Vivek Raghavan (14 papers)
Citations (33)