Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search (2308.03290v2)

Published 7 Aug 2023 in cs.CV and cs.LG

Abstract: Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed-precision methods have performed either a post-training quantization search, which compromises on accuracy, or a differentiable quantization search, which leads to high memory usage from branching. Therefore, we propose the first one-shot mixed-precision quantization search that eliminates the need for retraining in both integer and low-precision floating point models. We evaluate our search (FLIQS) on multiple convolutional and vision transformer networks to discover Pareto-optimal models. Our approach improves upon uniform precision, manual mixed-precision, and recent integer quantization search methods. With integer models, we increase the accuracy of ResNet-18 on ImageNet by 1.31% and ResNet-50 by 0.90% with equivalent model cost over previous methods. Additionally, for the first time, we explore a novel mixed-precision floating-point search and improve MobileNetV2 by up to 0.98% compared to prior state-of-the-art FP8 models. Finally, we extend FLIQS to simultaneously search a joint quantization and neural architecture space and improve the ImageNet accuracy by 2.69% with similar model cost on a MobileNetV2 search space.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Jordan Dotzel (13 papers)
  2. Gang Wu (143 papers)
  3. Andrew Li (21 papers)
  4. Muhammad Umar (7 papers)
  5. Yun Ni (5 papers)
  6. Mohamed S. Abdelfattah (37 papers)
  7. Zhiru Zhang (51 papers)
  8. Liqun Cheng (3 papers)
  9. Martin G. Dixon (1 paper)
  10. Norman P. Jouppi (6 papers)
  11. Quoc V. Le (128 papers)
  12. Sheng Li (217 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com