Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Published 4 Aug 2021 in cs.CL, cs.SD, and eess.AS | (2108.02034v1)

Abstract: Running automatic speech recognition (ASR) on edge devices is non-trivial due to resource constraints, especially in scenarios that require supporting multiple languages. We propose a new approach to enable multilingual speech recognition on edge devices. This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly, each fine-tuned for a particular accent. Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.