Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models (2403.19709v1)
Abstract: Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how the adapter parameters are allocated. The adapter consists of a single shared controller network and multiple task-level adapter heads to reduce the per-task parameter overhead without performance regression on downstream tasks. The adapter is also recurrent so the entire adapter parameters are reused across different layers of the pre-trained model. Our Hierarchical Recurrent Adapter (HRA) outperforms the previous adapter-based approaches as well as full model fine-tuning baseline in both single and multi-task adaptation settings when evaluated on automatic speech recognition tasks.
- “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, June 2019, pp. 4171–4186, Association for Computational Linguistics.
- “Robust continuous on-device personalization for automatic speech recognition.,” in Interspeech, 2021, pp. 1284–1288.
- “On-the-fly asr corrections with audio exemplars,” Proc. Interspeech 2022, pp. 3148–3152, 2022.
- “Nam+: Towards scalable end-to-end contextual biasing for adaptive asr,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 190–196.
- “Input-tuning: Adapting unfamiliar inputs to frozen pretrained models,” arXiv preprint arXiv:2203.03131, 2022.
- “Learning multiple visual domains with residual adapters,” Advances in neural information processing systems, vol. 30, 2017.
- “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
- “Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199, 2021.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Conformer: Convolution-augmented transformer for speech recognition,” in Proc. Interspeech 2020, 2020, pp. 5036–5040.
- “Residual adapters for parameter-efficient asr adaptation to atypical and accented speech,” arXiv preprint arXiv:2109.06952, 2021.
- “Modular domain adaptation for conformer-based streaming asr,” arXiv preprint arXiv:2305.13408, 2023.
- “Meta networks,” in International conference on machine learning. PMLR, 2017, pp. 2554–2563.
- Tsendsuren Munkhdalai, “Sparse meta networks for sequential adaptation and its application to adaptive language modelling,” arXiv preprint arXiv:2009.01803, 2020.
- “Read: Recurrent adaptation of large transformers,” arXiv preprint arXiv:2305.15348, 2023.
- “Resource-efficient transfer learning from speech foundation model using hierarchical feature fusion,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Deep independently recurrent neural network (indrnn),” arXiv preprint arXiv:1910.06251, 2019.
- “Google usm: Scaling automatic speech recognition beyond 100 languages,” arXiv preprint arXiv:2303.01037, 2023.
- “Self-supervised learning with random-projection quantizer for speech recognition,” in International Conference on Machine Learning. PMLR, 2022, pp. 3915–3924.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
- “Google’s privacy principles,” https://googleblog.blogspot.com/2010/01/googles-privacy-principles.html, Accessed: 2023-03-01.
- “Artificial intelligence at Google: Our principles,” https://ai.google/principles, Accessed: 2023-03-01.
- “Toward domain-invariant speech recognition via large scale training,” in 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 441–447.
- “Pseudo Label Is Better Than Human Label,” in Proc. Interspeech 2022, 2022, pp. 1421–1425.
- “Disordered speech data collection: lessons learned at 1 million utterances from project euphonia,” Proc. Interspeech 2021, pp. 4843–4847, 2021.
- “A scalable model specialization framework for training and inference using submodels and its application to speech model personalization,” Proc. Interspeech 2022, 2022.
- “Light gated recurrent units for speech recognition,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 92–102, 2018.