Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model (2309.13018v2)
Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways). Our approach dynamically adapts the sub-network, avoiding premature decisions about a fixed sub-network structure. We show that our approach outperforms existing pruning methods when targeting sparse monolingual models. Further, we illustrate that Dynamic ASR Pathways jointly discovers and trains better sub-networks (pathways) of a single multilingual model by adapting from different sub-network initializations, thereby reducing the need for language-specific pruning.
- “Dissecting User-Perceived Latency of On-Device E2E Speech Recognition,” in Interspeech 2021.
- “Streaming end-to-end speech recognition for mobile devices,” in ICASSP 2019.
- “Extremely Low Footprint End-to-End ASR System for Smart Device,” in Interspeech 2021.
- “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters,” in Interspeech 2020.
- “Massively multilingual asr on 70 languages: Tokenization, architecture, and generalization capabilities,” in ICASSP 2023.
- “Learning both weights and connections for efficient neural network,” in NeuraIPS 2015.
- “To prune, or not to prune: Exploring the efficacy of pruning for model compression,” in ICLR 2018.
- “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in ICLR 2019.
- “Optimizing speech recognition for the edge,” arXiv preprint arXiv:1909.12408, 2019.
- “Exploring sparsity in recurrent neural networks,” in ICLR 2017.
- “Comparing rewinding and fine-tuning in neural network pruning,” in ICLR 2020.
- “Learning asr pathways: A sparse multilingual asr model,” in ICASSP 2023.
- “Language-agnostic multilingual modeling,” in ICASSP 2020.
- “Intriguing properties of compression on multilingual models,” in EMNLP 2022.
- “Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model,” in Interspeech 2019.
- “Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition,” in Interspeech 2021.
- “Gradient surgery for multi-task learning,” NeurIPS 2020.
- “Causes and cures for interference in multilingual translation,” in ACL 2023.
- “Learning sparse sharing architectures for multiple tasks,” AAAI 2020.
- “Language adaptive cross-lingual speech representation learning with sparse sharing sub-networks,” in ICASSP 2022.
- “Learning language specific sub-network for multilingual machine translation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021.
- “Discovering language-neutral sub-networks in multilingual language models,” in EMNLP 2022.
- “Losses can be blessings: Routing self-supervised speech representations towards efficient multilingual and multitask speech processing,” NeurIPS 2022.
- “Parp: Prune, adjust and re-prune for self-supervised speech recognition,” NeurIPS 2021.
- “Omni-sparsity dnn: Fast sparsity optimization for on-device streaming e2e asr via supernet,” in ICASSP 2022.
- “Mls: A large-scale multilingual dataset for speech research,” arXiv preprint arXiv:2012.03411, 2020.
- “Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition,” in ICASSP 2021.
- “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Interspeech 2020.
- “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Interspeech 2019.
- “Learning a dual-mode speech recognition model via self-pruning,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 273–279.