Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

4-bit Quantization of LSTM-based Speech Recognition Models (2108.12074v1)

Published 27 Aug 2021 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a na\"ive quantization approach applied to the LSTM portion of these models results in significant Word Error Rate (WER) degradation. On the other hand, we show that minimal accuracy loss is achievable with an appropriate choice of quantizers and initializations. In particular, we customize quantization schemes depending on the local properties of the network, improving recognition performance while limiting computational time. We demonstrate our solution on the Switchboard (SWB) and CallHome (CH) test sets of the NIST Hub5-2000 evaluation. DBLSTM-HMMs trained with 300 or 2000 hours of SWB data achieves $<$0.5% and $<$1% average WER degradation, respectively. On the more challenging RNN-T models, our quantization strategy limits degradation in 4-bit inference to 1.3%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Andrea Fasoli (3 papers)
  2. Chia-Yu Chen (7 papers)
  3. Mauricio Serrano (3 papers)
  4. Xiao Sun (99 papers)
  5. Naigang Wang (15 papers)
  6. Swagath Venkataramani (14 papers)
  7. George Saon (39 papers)
  8. Xiaodong Cui (55 papers)
  9. Brian Kingsbury (54 papers)
  10. Wei Zhang (1489 papers)
  11. Kailash Gopalakrishnan (12 papers)
  12. Zoltán Tüske (7 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.