Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras (2310.14654v2)

Published 23 Oct 2023 in cs.CL and eess.AS

Abstract: India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sourcing SPRING-INX data which has about 2000 hours of legally sourced and manually transcribed speech data for ASR system building in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology Madras and is a part of National Language Translation Mission (NLTM), funded by the Indian Ministry of Electronics and Information Technology (MeitY), Government of India. We describe the data collection and data cleaning process along with the data statistics in this paper.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Nithya R (1 paper)
  2. Malavika S (1 paper)
  3. Jordan F (1 paper)
  4. Arjun Gangwar (2 papers)
  5. Metilda N J (1 paper)
  6. Rithik Sarab (1 paper)
  7. Akhilesh Kumar Dubey (1 paper)
  8. Govind Divakaran (1 paper)
  9. Samudra Vijaya K (1 paper)
  10. Suryakanth V Gangashetty (5 papers)
  11. S Umesh (6 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.