Learning ASR pathways: A sparse multilingual ASR model (2209.05735v4)
Abstract: Neural network pruning compresses automatic speech recognition (ASR) models effectively. However, in multilingual ASR, language-agnostic pruning may lead to severe performance drops on some languages because language-agnostic pruning masks may not fit all languages and discard important language-specific parameters. In this work, we present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks ("pathways"), such that the parameters for each language are learned explicitly. With the overlapping sub-networks, the shared parameters can also enable knowledge transfer for lower-resource languages via joint multilingual training. We propose a novel algorithm to learn ASR pathways, and evaluate the proposed method on 4 languages with a streaming RNN-T model. Our proposed ASR pathways outperform both dense models and a language-agnostically pruned model, and provide better performance on low-resource languages compared to the monolingual sparse models.
- “Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning,” in Proc. ICASSP, 2019.
- “Efficient knowledge distillation for rnn-transducer models,” in Proc. ICASSP, 2021.
- “Optimizing speech recognition for the edge,” arXiv preprint arXiv:1909.12408, 2019.
- “Omni-sparsity dnn: Fast sparsity optimization for on-device streaming e2e asr via supernet,” in Proc. ICASSP, 2022.
- “Audio lottery: Speech recognition made ultra-lightweight, noise-robust, and transferable,” in Proc. ICLR, 2022.
- “Learning a dual-mode speech recognition model via self-pruning,” in Proc. SLT, 2022.
- “Streaming end-to-end speech recognition for mobile devices,” in Proc. ICASSP, 2019.
- “A streaming on-device end-to-end model surpassing server-side conventional model quality and latency,” in Proc. ICASSP, 2020.
- “4-bit conformer with native quantization aware training for speech recognition,” arXiv preprint arXiv:2203.15952, 2022.
- “Multilingual speech recognition with a single end-to-end model,” in Proc. ICASSP, 2018.
- “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters,” in Proc. Interspeech, 2020.
- “Scaling end-to-end models for large-scale multilingual asr,” in Proc. ASRU, 2021.
- “Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model,” in Proc. Interspeech, 2019.
- “Joint unsupervised and supervised training for multilingual asr,” in Proc. ICASSP, 2022.
- “Language adaptive cross-lingual speech representation learning with sparse sharing sub-networks,” in Proc. ICASSP, 2022.
- “Mixture of informed experts for multilingual speech recognition,” in Proc. ICASSP, 2021.
- Jeff Dean, “Introducing pathways: A next-generation ai architecture,” https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/, accessed: Sep-2022.
- “To prune, or not to prune: exploring the efficacy of pruning for model compression,” arXiv preprint arXiv:1710.01878, 2017.
- “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in Proc. ICLR, 2019.
- “Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition,” in Proc. ICASSP, 2021.
- “Streaming transformer transducer based speech recognition using non-causal convolution,” in Proc. ICASSP, 2022.
- Alex Graves, “Sequence transduction with recurrent neural networks,” arXiv preprint arXiv:1211.3711, 2012.
- “Block-sparse recurrent neural networks,” arXiv preprint arXiv:1711.02782, 2017.
- “Lae: Language-aware encoder for monolingual and multilingual asr,” arXiv preprint arXiv:2206.02093, 2022.
- “Multilingual Speech Recognition with Self-Attention Structured Parameterization,” in Proc. Interspeech, 2020.
- “Unsupervised cross-lingual representation learning for speech recognition,” arXiv preprint arXiv:2006.13979, 2020.
- “Dynamic sparsity neural networks for automatic speech recognition,” in Proc. ICASSP, 2021.
- “Learning sparse sharing architectures for multiple tasks,” in Proc. AAAI, 2020.
- “The lottery ticket hypothesis for pre-trained bert networks,” in Proc. NeurIPS, 2020.
- “Coarsening the granularity: Towards structurally sparse lottery tickets,” arXiv preprint arXiv:2202.04736, 2022.
- “MLS: A large-scale multilingual dataset for speech research,” arXiv preprint arXiv:2012.03411, 2020.
- “Xls-r: Self-supervised cross-lingual speech representation learning at scale,” arXiv preprint arXiv:2111.09296, 2021.
- “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech, 2019.
- “Conformer: Convolution-augmented transformer for speech recognition,” Proc. Interspeech, 2020.
- “Improving rnn transducer based asr with auxiliary tasks,” in Proc. SLT, 2021.