Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers (2111.14836v1)

Published 29 Nov 2021 in cs.LG

Abstract: The high memory consumption and computational costs of Recurrent neural network LLMs (RNNLMs) limit their wider application on resource constrained devices. In recent years, neural network quantization techniques that are capable of producing extremely low-bit compression, for example, binarized RNNLMs, are gaining increasing research interests. Directly training of quantized neural networks is difficult. By formulating quantized RNNLMs training as an optimization problem, this paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM). This method can also flexibly adjust the trade-off between the compression rate and model performance using tied low-bit quantization tables. Experiments on two tasks: Penn Treebank (PTB), and Switchboard (SWBD) suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs. Faster convergence of 5 times in model training over the baseline binarized RNNLM quantization was also obtained. Index Terms: LLMs, Recurrent neural networks, Quantization, Alternating direction methods of multipliers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Junhao Xu (19 papers)
  2. Xie Chen (166 papers)
  3. Shoukang Hu (38 papers)
  4. Jianwei Yu (64 papers)
  5. Xunying Liu (92 papers)
  6. Helen Meng (204 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.