Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Up or Down? Adaptive Rounding for Post-Training Quantization (2004.10568v2)

Published 22 Apr 2020 in cs.LG, cs.CV, and stat.ML

Abstract: When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Markus Nagel (33 papers)
  2. Rana Ali Amjad (19 papers)
  3. Mart van Baalen (18 papers)
  4. Christos Louizos (30 papers)
  5. Tijmen Blankevoort (37 papers)
Citations (473)

Summary

Adaptive Rounding for Post-Training Quantization

This paper presents a novel method called AdaRound for optimizing the process of post-training quantization in neural networks. Quantization is essential for deploying deep learning models on resource-constrained devices, as it reduces the model size and inference time by converting network weights from floating-point to lower-bit fixed-point representations. The conventional approach of rounding weights to the nearest value can lead to suboptimal performance due to its simplistic nature. AdaRound addresses this limitation by adapting the rounding process to minimize task-specific loss.

Theoretical Framework

The authors formulate the rounding problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem by leveraging a second-order Taylor expansion of the task loss. The strategy involves approximating the task loss as a function of quantization perturbations, thereby enabling the derivation of an expression that inherently considers interactions between weight perturbations.

Methodology

AdaRound introduces a per-layer optimization strategy based on an approximation that reduces computational complexity significantly. The method involves a local minimization of the Mean Squared Error (MSE) at the layer level, using continuous relaxation to transform the discrete rounding process into a differentiable problem. This approach allows AdaRound to efficiently optimize quantization without fine-tuning, and requires only a small set of unlabeled data, making it practical for real-world deployment.

The continuous relaxation utilizes a rectified sigmoid function to facilitate smooth convergence to binary solutions, reinforced by a regularization term that encourages binary outcomes. This innovative approach mirrors the general family of Hopfield methods commonly used in large-scale combinatorial optimization.

Empirical Results

Extensive experiments demonstrate that AdaRound substantially outperforms traditional rounding-to-nearest methods, achieving state-of-the-art results in post-training quantization across a range of architectures such as ResNet18, ResNet50, MobileNetV2, InceptionV3, and DeepLabV3+. Notably, AdaRound can achieve 4-bit quantization for networks while maintaining an accuracy loss within 1%, a significant improvement over existing methods.

Implications and Future Directions

The findings suggest that adaptive techniques, which consider task-specific characteristics, can bridge the performance gap encountered with simplistic quantization schemes. The implications of this work extend beyond neural networks to any parametric system requiring efficient quantization. Furthermore, the framework encourages further research into adaptive quantization techniques, potentially integrating these principles with dynamic bit-width adjustments across diverse architectures and tasks.

Future directions could explore synergistic strategies combining AdaRound with hardware-oriented optimization methods, thereby enhancing the efficiency and deployment feasibility of quantized models on specialized hardware platforms. Additionally, further exploration into data-free approaches could extend AdaRound's applicability to scenarios with limited data availability.

In conclusion, AdaRound represents a significant advancement in post-training quantization, offering a versatile and efficient mechanism adaptable across various neural network models, thereby supporting the broader adoption of deep learning approaches on low-power devices.