Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition (2309.15223v2)

Published 26 Sep 2023 in cs.CL, cs.AI, cs.LG, cs.NE, cs.SD, and eess.AS

Abstract: We propose a neural LLMing system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained LLMs (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.

Citations (5)

View on Semantic Scholar

Summary

The paper presents a parameter-efficient rescoring approach (LoRB) that reduces training parameters to 0.08% while achieving comparable ASR results.
It employs low-rank matrix decomposition with discriminative training and correlation-based regularization to mitigate overfitting during domain adaptation.
Experimental results show that LoRB robustly generalizes across domains, offering a practical solution for memory and latency-constrained speech recognition systems.

Low-rank Adaptation of LLM Rescoring for Parameter-Efficient Speech Recognition

The paper under discussion proposes a novel approach to neural LLMing within the context of automatic speech recognition (ASR) focusing on rescoring tasks, specifically employing Low-Rank Adaptation (LoRA) techniques. This research seeks to mitigate the significant computational challenges associated with conventional fine-tuning of LLMs like BERT, especially in specific domain adaptations. By leveraging low-rank matrix decomposition, the paper demonstrates a reduction in training parameters to a mere fraction (approximately 0.08%) of the full model, while achieving comparable performance.

Overview and Methodology

In ASR systems, second-pass rescoring is a critical enhancement technique that improves speech recognition accuracy by correcting errors made in the first-pass decoding process. This involves re-evaluating the N-best list outputs generated by the first-pass acoustic model using more sophisticated LLMs. Traditional LLMs like BERT are known for their superior performance in this context due to their ability to incorporate vast linguistic knowledge. However, their high computational footprint poses practical deployment challenges, particularly in memory and latency-sensitive environments.

The authors present a parameter-efficient rescoring solution, coined as "Low-rank Rescoring BERT" (LoRB), which sidesteps the need for exhaustive parameter updates required in full fine-tuning by introducing additional, trainable low-dimensional matrices at select network layers. This low-rank decomposition aligns with the discriminative training objective of minimizing the word error rate, which is a standard metric for evaluating ASR systems.

The optimization process in LoRB utilizes discriminative training along with a correlation-based regularization technique. This regularization addresses potential overfitting issues that can arise from low-rank adaptations by preserving the expressive capacity of the model's hidden layers. Such improvements become especially crucial when adapting to data-scarce domains where overfitting often leads to degraded performance.

Experimental Results

The LoRB framework was benchmarked against full model fine-tuning and several other parameter-efficient techniques—such as Adapters, BitFit, and Prefix-tuning—using both publicly available datasets like LibriSpeech and proprietary datasets spanning different domains. The empirical results indicate that LoRB maintains or even surpasses the performance of full fine-tuning while boasting substantial reductions in computational costs and training times, achieved by the considerable decreases in trainable parameters.

On scenarios involving domain adaptation using smaller-sized datasets, LoRB exhibited robust generalization, maintaining performance across both target and non-target domains, unlike full fine-tuning which suffered from performance degradation in non-target domains. Furthermore, the incorporation of the correlation-based regularization presented significant improvements, mitigating degradation and enhancing generalization capabilities.

One of the notable findings from the scaling experiments is that larger model and data sizes further augment the benefits of low-rank adaptation, cementing LoRB's applicability to industrial-scale ASR systems where efficient utilization of vast computational resources is paramount.

Implications and Future Directions

The implications of this research are substantial for both practical and theoretical aspects within the field of natural language processing and ASR. Practically, the deployment of LoRB can enable more efficient utilization of LLMs in computing-constrained environments like edge devices or cloud-based services with strict latency requirements. Theoretical advancements encompass further exploration into the robustness and adaptability of low-rank adaptations, especially in dynamically evolving domains where data is ever-changing.

Looking forward, further developments might include integrating real-time adaptation mechanisms for changing acoustic environments or user preferences, potentially incorporating additional auxiliary tasks to enhance linguistic and contextual understanding. Continuous advancements in understanding the balance between parameter efficiency and model expressiveness might yield even more effective adaptation strategies for LLMs beyond the scope of speech recognition.

In summary, this paper presents a compelling case for the use of low-rank adaptations in enhancing the flexibility and efficiency of LLMs in ASR, paving the way for wide-ranging applications in the speech recognition field and beyond.

PDF Markdown

Related Papers

YouTube

Show All Videos