2000 character limit reached
Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages (2206.01205v2)
Published 1 Jun 2022 in eess.AS, cs.LG, and cs.SD
Abstract: Automatic Speech Recognition (ASR) has increasing utility in the modern world. There are a many ASR models available for languages with large amounts of training data like English. However, low-resource languages are poorly represented. In response we create and release an open-licensed and formatted dataset of audio recordings of the Bible in low-resource northern Indian languages. We setup multiple experimental splits and train and analyze two competitive ASR models to serve as the baseline for future research using this data.
- Kavitha Raju (2 papers)
- Anjaly V (1 paper)
- Ryan Lish (1 paper)
- Joel Mathew (7 papers)