Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning ASR pathways: A sparse multilingual ASR model (2209.05735v4)

Published 13 Sep 2022 in eess.AS and cs.CL

Abstract: Neural network pruning compresses automatic speech recognition (ASR) models effectively. However, in multilingual ASR, language-agnostic pruning may lead to severe performance drops on some languages because language-agnostic pruning masks may not fit all languages and discard important language-specific parameters. In this work, we present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks ("pathways"), such that the parameters for each language are learned explicitly. With the overlapping sub-networks, the shared parameters can also enable knowledge transfer for lower-resource languages via joint multilingual training. We propose a novel algorithm to learn ASR pathways, and evaluate the proposed method on 4 languages with a streaming RNN-T model. Our proposed ASR pathways outperform both dense models and a language-agnostically pruned model, and provide better performance on low-resource languages compared to the monolingual sparse models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. “Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning,” in Proc. ICASSP, 2019.
  2. “Efficient knowledge distillation for rnn-transducer models,” in Proc. ICASSP, 2021.
  3. “Optimizing speech recognition for the edge,” arXiv preprint arXiv:1909.12408, 2019.
  4. “Omni-sparsity dnn: Fast sparsity optimization for on-device streaming e2e asr via supernet,” in Proc. ICASSP, 2022.
  5. “Audio lottery: Speech recognition made ultra-lightweight, noise-robust, and transferable,” in Proc. ICLR, 2022.
  6. “Learning a dual-mode speech recognition model via self-pruning,” in Proc. SLT, 2022.
  7. “Streaming end-to-end speech recognition for mobile devices,” in Proc. ICASSP, 2019.
  8. “A streaming on-device end-to-end model surpassing server-side conventional model quality and latency,” in Proc. ICASSP, 2020.
  9. “4-bit conformer with native quantization aware training for speech recognition,” arXiv preprint arXiv:2203.15952, 2022.
  10. “Multilingual speech recognition with a single end-to-end model,” in Proc. ICASSP, 2018.
  11. “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters,” in Proc. Interspeech, 2020.
  12. “Scaling end-to-end models for large-scale multilingual asr,” in Proc. ASRU, 2021.
  13. “Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model,” in Proc. Interspeech, 2019.
  14. “Joint unsupervised and supervised training for multilingual asr,” in Proc. ICASSP, 2022.
  15. “Language adaptive cross-lingual speech representation learning with sparse sharing sub-networks,” in Proc. ICASSP, 2022.
  16. “Mixture of informed experts for multilingual speech recognition,” in Proc. ICASSP, 2021.
  17. Jeff Dean, “Introducing pathways: A next-generation ai architecture,” https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/, accessed: Sep-2022.
  18. “To prune, or not to prune: exploring the efficacy of pruning for model compression,” arXiv preprint arXiv:1710.01878, 2017.
  19. “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in Proc. ICLR, 2019.
  20. “Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition,” in Proc. ICASSP, 2021.
  21. “Streaming transformer transducer based speech recognition using non-causal convolution,” in Proc. ICASSP, 2022.
  22. Alex Graves, “Sequence transduction with recurrent neural networks,” arXiv preprint arXiv:1211.3711, 2012.
  23. “Block-sparse recurrent neural networks,” arXiv preprint arXiv:1711.02782, 2017.
  24. “Lae: Language-aware encoder for monolingual and multilingual asr,” arXiv preprint arXiv:2206.02093, 2022.
  25. “Multilingual Speech Recognition with Self-Attention Structured Parameterization,” in Proc. Interspeech, 2020.
  26. “Unsupervised cross-lingual representation learning for speech recognition,” arXiv preprint arXiv:2006.13979, 2020.
  27. “Dynamic sparsity neural networks for automatic speech recognition,” in Proc. ICASSP, 2021.
  28. “Learning sparse sharing architectures for multiple tasks,” in Proc. AAAI, 2020.
  29. “The lottery ticket hypothesis for pre-trained bert networks,” in Proc. NeurIPS, 2020.
  30. “Coarsening the granularity: Towards structurally sparse lottery tickets,” arXiv preprint arXiv:2202.04736, 2022.
  31. “MLS: A large-scale multilingual dataset for speech research,” arXiv preprint arXiv:2012.03411, 2020.
  32. “Xls-r: Self-supervised cross-lingual speech representation learning at scale,” arXiv preprint arXiv:2111.09296, 2021.
  33. “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech, 2019.
  34. “Conformer: Convolution-augmented transformer for speech recognition,” Proc. Interspeech, 2020.
  35. “Improving rnn transducer based asr with auxiliary tasks,” in Proc. SLT, 2021.
Citations (10)

Summary

We haven't generated a summary for this paper yet.