Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study (2307.08072v2)

Published 16 Jul 2023 in cs.CL and cs.AI

Abstract: Despite the superior performance, LLMs~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small LLMs. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Peiyu Liu (27 papers)
  2. Zikang Liu (11 papers)
  3. Ze-Feng Gao (24 papers)
  4. Dawei Gao (27 papers)
  5. Wayne Xin Zhao (196 papers)
  6. Yaliang Li (117 papers)
  7. Bolin Ding (112 papers)
  8. Ji-Rong Wen (299 papers)
Citations (27)