Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-task Speech Recognition (2310.13015v1)

Published 17 Oct 2023 in cs.CL, cs.AI, and eess.AS

Abstract: Adapters are an efficient, composable alternative to full fine-tuning of pre-trained models and help scale the deployment of large ASR models to many tasks. In practice, a task ID is commonly prepended to the input during inference to route to single-task adapters for the specified task. However, one major limitation of this approach is that the task ID may not be known during inference, rendering it unsuitable for most multi-task settings. To address this, we propose three novel task-ID-free methods to combine single-task adapters in multi-task ASR and investigate two learning algorithms for training. We evaluate our methods on 10 test sets from 4 diverse ASR tasks and show that our methods are non-destructive and parameter-efficient. While only updating 17% of the model parameters, our methods can achieve an 8% mean WER improvement relative to full fine-tuning and are on-par with task-ID adapter routing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. “Transfer learning for speech recognition on a budget,” CoRR, vol. abs/1706.00290, 2017.
  2. “Conformer: Convolution-augmented transformer for speech recognition,” arXiv preprint arXiv:2005.08100, 2020.
  3. “Pushing the limits of semi-supervised learning for automatic speech recognition,” arXiv preprint arXiv:2010.10504, 2020.
  4. “A survey of transformers,” AI Open, 2022.
  5. “Recent developments on espnet toolkit boosted by conformer,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5874–5878.
  6. “An experimental evaluation of transformer-based language models in the biomedical domain,” CoRR, vol. abs/2012.15419, 2020.
  7. “Efficient adapter transfer of self-supervised speech models for automatic speech recognition,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7102–7106.
  8. “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 4835–4839.
  9. “A general multi-task learning framework to leverage text data for speech to text tasks,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6209–6213.
  10. “Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks,” arXiv preprint arXiv:1811.01088, 2018.
  11. “Transformer-based models for question answering on covid19,” arXiv preprint arXiv:2101.11432, 2021.
  12. “Effect of scale on catastrophic forgetting in neural networks,” in International Conference on Learning Representations, 2022.
  13. “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  14. Sebastian Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
  15. “Mixture of informed experts for multilingual speech recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6234–6238.
  16. “A comparison of loss weighting strategies for multi task learning in deep neural networks,” IEEE Access, vol. 7, pp. 141627–141632, 2019.
  17. “E2e-based multi-task learning approach to joint speech and accent recognition,” arXiv preprint arXiv:2106.08211, 2021.
  18. “Conflict-averse gradient descent for multi-task learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18878–18890, 2021.
  19. “Doctor xavier: Explainable diagnosis on physician-patient dialogues and xai evaluation,” BioNLP 2022@ ACL 2022, p. 337, 2022.
  20. “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning. PMLR, 2019, pp. 2790–2799.
  21. “Exploiting adapters for cross-lingual low-resource speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 317–329, 2021.
  22. “Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  23. “Modular Domain Adaptation for Conformer-Based Streaming ASR,” in Proc. INTERSPEECH 2023, 2023, pp. 3357–3361.
  24. “Adapterfusion: Non-destructive task composition for transfer learning,” arXiv preprint arXiv:2005.00247, 2020.
  25. “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  26. Alex Graves, “Sequence transduction with recurrent neural networks,” arXiv preprint arXiv:1211.3711, 2012.
  27. “Opt\\\backslash\_einsum-a python package for optimizing contraction order for einsum-like expressions,” Journal of Open Source Software, vol. 3, no. 26, pp. 753, 2018.
  28. “Bert and pals: Projected attention layers for efficient adaptation in multi-task learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 5986–5995.
  29. “Hybrid autoregressive transducer (hat),” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6139–6143.
  30. “Fastemit: Low-latency streaming asr with sequence-level emission regularization,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6004–6008.
  31. “A comparison of semi-supervised learning techniques for streaming asr at scale,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  32. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
  33. “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech 2019, 2019, pp. 2613–2617.

Summary

We haven't generated a summary for this paper yet.