Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance (2303.13003v1)

Published 23 Mar 2023 in cs.LG and cs.CV

Abstract: Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored. This paper first investigates this problem on various commonly-used PTQ methods. We aim to answer several research questions related to the influence of calibration set distribution variations, calibration paradigm selection, and data augmentation or sampling strategies on PTQ reliability. A systematic evaluation process is conducted across a wide range of tasks and commonly-used PTQ paradigms. The results show that most existing PTQ methods are not reliable enough in term of the worst-case group performance, highlighting the need for more robust methods. Our findings provide insights for developing PTQ methods that can effectively handle distribution shift scenarios and enable the deployment of quantized DNNs in real-world applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhihang Yuan (45 papers)
  2. Jiawei Liu (156 papers)
  3. Jiaxiang Wu (27 papers)
  4. Dawei Yang (61 papers)
  5. Qiang Wu (154 papers)
  6. Guangyu Sun (47 papers)
  7. Wenyu Liu (146 papers)
  8. Xinggang Wang (163 papers)
  9. Bingzhe Wu (58 papers)
Citations (5)