Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-task Speech Recognition (2310.13015v1)
Abstract: Adapters are an efficient, composable alternative to full fine-tuning of pre-trained models and help scale the deployment of large ASR models to many tasks. In practice, a task ID is commonly prepended to the input during inference to route to single-task adapters for the specified task. However, one major limitation of this approach is that the task ID may not be known during inference, rendering it unsuitable for most multi-task settings. To address this, we propose three novel task-ID-free methods to combine single-task adapters in multi-task ASR and investigate two learning algorithms for training. We evaluate our methods on 10 test sets from 4 diverse ASR tasks and show that our methods are non-destructive and parameter-efficient. While only updating 17% of the model parameters, our methods can achieve an 8% mean WER improvement relative to full fine-tuning and are on-par with task-ID adapter routing.
- “Transfer learning for speech recognition on a budget,” CoRR, vol. abs/1706.00290, 2017.
- “Conformer: Convolution-augmented transformer for speech recognition,” arXiv preprint arXiv:2005.08100, 2020.
- “Pushing the limits of semi-supervised learning for automatic speech recognition,” arXiv preprint arXiv:2010.10504, 2020.
- “A survey of transformers,” AI Open, 2022.
- “Recent developments on espnet toolkit boosted by conformer,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5874–5878.
- “An experimental evaluation of transformer-based language models in the biomedical domain,” CoRR, vol. abs/2012.15419, 2020.
- “Efficient adapter transfer of self-supervised speech models for automatic speech recognition,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7102–7106.
- “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 4835–4839.
- “A general multi-task learning framework to leverage text data for speech to text tasks,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6209–6213.
- “Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks,” arXiv preprint arXiv:1811.01088, 2018.
- “Transformer-based models for question answering on covid19,” arXiv preprint arXiv:2101.11432, 2021.
- “Effect of scale on catastrophic forgetting in neural networks,” in International Conference on Learning Representations, 2022.
- “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
- Sebastian Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
- “Mixture of informed experts for multilingual speech recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6234–6238.
- “A comparison of loss weighting strategies for multi task learning in deep neural networks,” IEEE Access, vol. 7, pp. 141627–141632, 2019.
- “E2e-based multi-task learning approach to joint speech and accent recognition,” arXiv preprint arXiv:2106.08211, 2021.
- “Conflict-averse gradient descent for multi-task learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18878–18890, 2021.
- “Doctor xavier: Explainable diagnosis on physician-patient dialogues and xai evaluation,” BioNLP 2022@ ACL 2022, p. 337, 2022.
- “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning. PMLR, 2019, pp. 2790–2799.
- “Exploiting adapters for cross-lingual low-resource speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 317–329, 2021.
- “Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Modular Domain Adaptation for Conformer-Based Streaming ASR,” in Proc. INTERSPEECH 2023, 2023, pp. 3357–3361.
- “Adapterfusion: Non-destructive task composition for transfer learning,” arXiv preprint arXiv:2005.00247, 2020.
- “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
- Alex Graves, “Sequence transduction with recurrent neural networks,” arXiv preprint arXiv:1211.3711, 2012.
- “Opt\\\backslash\_einsum-a python package for optimizing contraction order for einsum-like expressions,” Journal of Open Source Software, vol. 3, no. 26, pp. 753, 2018.
- “Bert and pals: Projected attention layers for efficient adaptation in multi-task learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 5986–5995.
- “Hybrid autoregressive transducer (hat),” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6139–6143.
- “Fastemit: Low-latency streaming asr with sequence-level emission regularization,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6004–6008.
- “A comparison of semi-supervised learning techniques for streaming asr at scale,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
- “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech 2019, 2019, pp. 2613–2617.