Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models (2403.19709v1)

Published 25 Mar 2024 in eess.AS, cs.AI, cs.CL, cs.LG, and cs.NE

Abstract: Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how the adapter parameters are allocated. The adapter consists of a single shared controller network and multiple task-level adapter heads to reduce the per-task parameter overhead without performance regression on downstream tasks. The adapter is also recurrent so the entire adapter parameters are reused across different layers of the pre-trained model. Our Hierarchical Recurrent Adapter (HRA) outperforms the previous adapter-based approaches as well as full model fine-tuning baseline in both single and multi-task adaptation settings when evaluated on automatic speech recognition tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, June 2019, pp. 4171–4186, Association for Computational Linguistics.
  2. “Robust continuous on-device personalization for automatic speech recognition.,” in Interspeech, 2021, pp. 1284–1288.
  3. “On-the-fly asr corrections with audio exemplars,” Proc. Interspeech 2022, pp. 3148–3152, 2022.
  4. “Nam+: Towards scalable end-to-end contextual biasing for adaptive asr,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 190–196.
  5. “Input-tuning: Adapting unfamiliar inputs to frozen pretrained models,” arXiv preprint arXiv:2203.03131, 2022.
  6. “Learning multiple visual domains with residual adapters,” Advances in neural information processing systems, vol. 30, 2017.
  7. “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  8. “Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199, 2021.
  9. “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  10. “Conformer: Convolution-augmented transformer for speech recognition,” in Proc. Interspeech 2020, 2020, pp. 5036–5040.
  11. “Residual adapters for parameter-efficient asr adaptation to atypical and accented speech,” arXiv preprint arXiv:2109.06952, 2021.
  12. “Modular domain adaptation for conformer-based streaming asr,” arXiv preprint arXiv:2305.13408, 2023.
  13. “Meta networks,” in International conference on machine learning. PMLR, 2017, pp. 2554–2563.
  14. Tsendsuren Munkhdalai, “Sparse meta networks for sequential adaptation and its application to adaptive language modelling,” arXiv preprint arXiv:2009.01803, 2020.
  15. “Read: Recurrent adaptation of large transformers,” arXiv preprint arXiv:2305.15348, 2023.
  16. “Resource-efficient transfer learning from speech foundation model using hierarchical feature fusion,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  17. “Deep independently recurrent neural network (indrnn),” arXiv preprint arXiv:1910.06251, 2019.
  18. “Google usm: Scaling automatic speech recognition beyond 100 languages,” arXiv preprint arXiv:2303.01037, 2023.
  19. “Self-supervised learning with random-projection quantizer for speech recognition,” in International Conference on Machine Learning. PMLR, 2022, pp. 3915–3924.
  20. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
  21. “Google’s privacy principles,” https://googleblog.blogspot.com/2010/01/googles-privacy-principles.html, Accessed: 2023-03-01.
  22. “Artificial intelligence at Google: Our principles,” https://ai.google/principles, Accessed: 2023-03-01.
  23. “Toward domain-invariant speech recognition via large scale training,” in 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 441–447.
  24. “Pseudo Label Is Better Than Human Label,” in Proc. Interspeech 2022, 2022, pp. 1421–1425.
  25. “Disordered speech data collection: lessons learned at 1 million utterances from project euphonia,” Proc. Interspeech 2021, pp. 4843–4847, 2021.
  26. “A scalable model specialization framework for training and inference using submodels and its application to speech model personalization,” Proc. Interspeech 2022, 2022.
  27. “Light gated recurrent units for speech recognition,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 92–102, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.