Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition (1802.02656v1)
Abstract: The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios. Differences in speaker accents are a significant source of such mismatch. The traditional approach to deal with multiple accents involves pooling data from several accents during training and building a single model in multi-task fashion, where tasks correspond to individual accents. In this paper, we explore an alternate model where we jointly learn an accent classifier and a multi-task acoustic model. Experiments on the American English Wall Street Journal and British English Cambridge corpora demonstrate that our joint model outperforms the strong multi-task acoustic model baseline. We obtain a 5.94% relative improvement in word error rate on British English, and 9.47% relative improvement on American English. This illustrates that jointly modeling with accent information improves acoustic model performance.
- Xuesong Yang (18 papers)
- Kartik Audhkhasi (22 papers)
- Andrew Rosenberg (32 papers)
- Samuel Thomas (42 papers)
- Bhuvana Ramabhadran (47 papers)
- Mark Hasegawa-Johnson (62 papers)