Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition (2401.10447v1)

Published 19 Jan 2024 in cs.CL, cs.AI, cs.LG, cs.NE, cs.SD, and eess.AS

Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained LLMs (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dataset and of 3.67\% on an internal dataset in the messaging domain. To further characterize the stability of LoRA-based second-pass speech recognition models, we examine robustness against input perturbations. These perturbations are rooted in homophone replacements and a novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both designed to measure the relative degradation in the performance of rescoring models. Our experimental results indicate that while advanced variants of LoRA, such as dynamic rank-allocated LoRA, lead to performance degradation in $1$-best perturbation, they alleviate the degradation in $N$-best perturbation. This finding is in comparison to fully-tuned models and vanilla LoRA tuning baselines, suggesting that a comprehensive selection is needed when using LoRA-based adaptation for compute-cost savings and robust LLMing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “Automatic speech recognition–a brief history of the technology development,” Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara, vol. 1, pp. 67, 2005.
  2. “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  3. “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.
  4. “Multi-task language modeling for improving speech recognition of rare words,” in Proc. of ASRU. IEEE, 2021, pp. 1087–1093.
  5. “Recurrent neural network based language model.,” in Proc. Interspeech, Makuhari, 2010, vol. 2, pp. 1045–1048.
  6. “Rescorebert: Discriminative speech recognition rescoring with bert,” in Proc. of ICASSP 2022. IEEE, 2022, pp. 6117–6121.
  7. “Scaling laws for discriminative speech recognition rescoring models,” Proc. of Interspeech, 2023.
  8. “Characterizing speech adversarial examples using self-attention u-net enhancement,” in Proc. of ICASSP. IEEE, 2020, pp. 3107–3111.
  9. “Mitigating closed-model adversarial examples with bayesian neural modeling for enhanced end-to-end speech recognition,” in Proc. of ICASSP. IEEE, 2022, pp. 6302–6306.
  10. “Improving the robustness of speech translation,” arXiv preprint arXiv:1811.00728, 2018.
  11. “Phonetic embedding for asr robustness in entity resolution,” in Proc. Interspeech, 2022.
  12. “Voice2series: Reprogramming acoustic models for time series classification,” in International conference on machine learning. PMLR, 2021, pp. 11808–11819.
  13. “Data augmentation for training dialog models robust to speech recognition errors,” arXiv preprint arXiv:2006.05635, 2020.
  14. “Nest: A neural network synthesis tool based on a grow-and-prune paradigm,” IEEE Transactions on Computers, vol. 68, no. 10, pp. 1487–1497, 2019.
  15. “Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,” Nature communications, vol. 9, no. 1, pp. 2383, 2018.
  16. “Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization,” in International Conference on Machine Learning. PMLR, 2019, pp. 4646–4655.
  17. “Adaptive budget allocation for parameter-efficient fine-tuning,” arXiv preprint arXiv:2303.10512, 2023.
  18. “Stack more layers differently: High-rank training through low-rank updates,” arXiv preprint arXiv:2307.05695, 2023.
  19. “Training on synthetic noise improves robustness to natural noise in machine translation,” arXiv preprint arXiv:1902.01509, 2019.
  20. “Evaluating robustness to input perturbations for neural machine translation,” arXiv preprint arXiv:2005.00580, 2020.
  21. “From hero to z\\\backslash\’eroe: A benchmark of low-level adversarial attacks,” arXiv preprint arXiv:2010.05648, 2020.
  22. “On generating combilex pronunciations via morphological analysis,” in Proc. Interspeech, 2010.
  23. “Supervised learning of universal sentence representations from natural language inference data,” arXiv preprint arXiv:1705.02364, 2017.
  24. “Peft: State-of-the-art parameter-efficient fine-tuning methods,” https://github.com/huggingface/peft, 2022.
  25. “Low-rank adaptation of neural language model rescoring for speech recognition,” in Proc. of IEEE ASRU, 2023.
  26. “Librispeech: an asr corpus based on public domain audio books,” in Proc. of ICASSP, 2015, pp. 5206–5210.
  27. “What does bert learn about the structure of language?,” in ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, 2019.
  28. “What happens to bert embeddings during fine-tuning?,” arXiv preprint arXiv:2004.14448, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.