Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers (2212.08254v2)

Published 16 Dec 2022 in cs.CV and cs.LG

Abstract: Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scale-reparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe inter-channel variation and post-Softmax activations with power-law features, and initially apply channel-wise quantization and log$\sqrt{2}$ quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level. Code is available at https://github.com/zkkli/RepQ-ViT.

Overview of RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

This paper introduces RepQ-ViT, an innovative framework designed to enhance the post-training quantization (PTQ) of Vision Transformers (ViTs). As PTQ methods are increasingly sought after for deploying heavy models on resource-constrained devices, RepQ-ViT addresses key challenges in quantizing ViTs without retraining, leveraging a novel strategy of quantization scale reparameterization.

Key Contributions

  1. Quantization-Inference Decoupling Paradigm: The core of RepQ-ViT lies in its ability to disentangle the quantization process from the inference process. Complex quantizers are utilized during quantization to meticulously capture data distributions, while simpler, hardware-efficient quantizers are employed during inference, effectively bridging the two processes through scale reparameterization.
  2. Targeting Extreme Distributions: The framework specifically targets two components problematic for PTQ in ViTs—post-LayerNorm activations characterized by severe inter-channel variations, and post-Softmax activations with power-law distributions:
    • Post-LayerNorm Activations: Initial quantization is performed on a channel-wise basis to preserve variation, which is subsequently reparameterized to a layer-wise quantization strategy, optimizing it for hardware without significant accuracy loss.
    • Post-Softmax Activations: These are initially quantized using log2\sqrt{2} quantization. Through base transformation, this is converted to log2 quantization, allowing computational efficiency aligned with hardware constraints.
  3. Performance Validation: RepQ-ViT exhibits superior performance across several vision tasks and model variants. In experiments with ImageNet for image classification and COCO for object detection and instance segmentation, RepQ-ViT surpasses existing PTQ methods by maintaining high accuracy even with low-bit quantization like 4-bit and 6-bit configurations, significantly minimizing the accuracy drop typically associated with quantization.

Results and Implications

The implementation of RepQ-ViT has shown a substantial improvement in quantization performance, particularly in enabling 4-bit quantization to achieve usable performance levels, previously a challenging target. This demonstrates RepQ-ViT’s potential for practical deployment in edge devices where computational and memory resources are limited.

Furthermore, the framework distinguishes itself by being hyperparameter-free and not relying on expensive reconstruction processes, making it a versatile and broadly applicable solution for various transformer-based models beyond ViTs. This scalability and flexibility indicate potential future applications in broader AI contexts where model efficiency is crucial.

Future Directions

RepQ-ViT opens several avenues for future research, particularly in extending the decoupled quantization-inference paradigm to other neural network architectures. Moreover, integrating RepQ-ViT with other model compression techniques—such as pruning or knowledge distillation—could provide additional performance gains. Investigating these combinations can help achieve even more efficient models that maintain or exceed the predictive performance of current deep learning solutions.

Overall, RepQ-ViT represents a significant advancement in the efficient deployment of Vision Transformers, balancing the demanding computational requirements with practical utility in edge computing scenarios. This research sets the stage for further exploration into scalable and efficient model compression strategies, maintaining the momentum towards practical and efficient AI systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhikai Li (24 papers)
  2. Junrui Xiao (9 papers)
  3. Lianwei Yang (6 papers)
  4. Qingyi Gu (25 papers)
Citations (51)