Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model (2309.13018v2)

Published 22 Sep 2023 in eess.AS, cs.CL, and cs.SD

Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways). Our approach dynamically adapts the sub-network, avoiding premature decisions about a fixed sub-network structure. We show that our approach outperforms existing pruning methods when targeting sparse monolingual models. Further, we illustrate that Dynamic ASR Pathways jointly discovers and trains better sub-networks (pathways) of a single multilingual model by adapting from different sub-network initializations, thereby reducing the need for language-specific pruning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Dissecting User-Perceived Latency of On-Device E2E Speech Recognition,” in Interspeech 2021.
  2. “Streaming end-to-end speech recognition for mobile devices,” in ICASSP 2019.
  3. “Extremely Low Footprint End-to-End ASR System for Smart Device,” in Interspeech 2021.
  4. “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters,” in Interspeech 2020.
  5. “Massively multilingual asr on 70 languages: Tokenization, architecture, and generalization capabilities,” in ICASSP 2023.
  6. “Learning both weights and connections for efficient neural network,” in NeuraIPS 2015.
  7. “To prune, or not to prune: Exploring the efficacy of pruning for model compression,” in ICLR 2018.
  8. “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in ICLR 2019.
  9. “Optimizing speech recognition for the edge,” arXiv preprint arXiv:1909.12408, 2019.
  10. “Exploring sparsity in recurrent neural networks,” in ICLR 2017.
  11. “Comparing rewinding and fine-tuning in neural network pruning,” in ICLR 2020.
  12. “Learning asr pathways: A sparse multilingual asr model,” in ICASSP 2023.
  13. “Language-agnostic multilingual modeling,” in ICASSP 2020.
  14. “Intriguing properties of compression on multilingual models,” in EMNLP 2022.
  15. “Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model,” in Interspeech 2019.
  16. “Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition,” in Interspeech 2021.
  17. “Gradient surgery for multi-task learning,” NeurIPS 2020.
  18. “Causes and cures for interference in multilingual translation,” in ACL 2023.
  19. “Learning sparse sharing architectures for multiple tasks,” AAAI 2020.
  20. “Language adaptive cross-lingual speech representation learning with sparse sharing sub-networks,” in ICASSP 2022.
  21. “Learning language specific sub-network for multilingual machine translation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021.
  22. “Discovering language-neutral sub-networks in multilingual language models,” in EMNLP 2022.
  23. “Losses can be blessings: Routing self-supervised speech representations towards efficient multilingual and multitask speech processing,” NeurIPS 2022.
  24. “Parp: Prune, adjust and re-prune for self-supervised speech recognition,” NeurIPS 2021.
  25. “Omni-sparsity dnn: Fast sparsity optimization for on-device streaming e2e asr via supernet,” in ICASSP 2022.
  26. “Mls: A large-scale multilingual dataset for speech research,” arXiv preprint arXiv:2012.03411, 2020.
  27. “Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition,” in ICASSP 2021.
  28. “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Interspeech 2020.
  29. “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Interspeech 2019.
  30. “Learning a dual-mode speech recognition model via self-pruning,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 273–279.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com