Adaptive Rounding for Post-Training Quantization
This paper presents a novel method called AdaRound for optimizing the process of post-training quantization in neural networks. Quantization is essential for deploying deep learning models on resource-constrained devices, as it reduces the model size and inference time by converting network weights from floating-point to lower-bit fixed-point representations. The conventional approach of rounding weights to the nearest value can lead to suboptimal performance due to its simplistic nature. AdaRound addresses this limitation by adapting the rounding process to minimize task-specific loss.
Theoretical Framework
The authors formulate the rounding problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem by leveraging a second-order Taylor expansion of the task loss. The strategy involves approximating the task loss as a function of quantization perturbations, thereby enabling the derivation of an expression that inherently considers interactions between weight perturbations.
Methodology
AdaRound introduces a per-layer optimization strategy based on an approximation that reduces computational complexity significantly. The method involves a local minimization of the Mean Squared Error (MSE) at the layer level, using continuous relaxation to transform the discrete rounding process into a differentiable problem. This approach allows AdaRound to efficiently optimize quantization without fine-tuning, and requires only a small set of unlabeled data, making it practical for real-world deployment.
The continuous relaxation utilizes a rectified sigmoid function to facilitate smooth convergence to binary solutions, reinforced by a regularization term that encourages binary outcomes. This innovative approach mirrors the general family of Hopfield methods commonly used in large-scale combinatorial optimization.
Empirical Results
Extensive experiments demonstrate that AdaRound substantially outperforms traditional rounding-to-nearest methods, achieving state-of-the-art results in post-training quantization across a range of architectures such as ResNet18, ResNet50, MobileNetV2, InceptionV3, and DeepLabV3+. Notably, AdaRound can achieve 4-bit quantization for networks while maintaining an accuracy loss within 1%, a significant improvement over existing methods.
Implications and Future Directions
The findings suggest that adaptive techniques, which consider task-specific characteristics, can bridge the performance gap encountered with simplistic quantization schemes. The implications of this work extend beyond neural networks to any parametric system requiring efficient quantization. Furthermore, the framework encourages further research into adaptive quantization techniques, potentially integrating these principles with dynamic bit-width adjustments across diverse architectures and tasks.
Future directions could explore synergistic strategies combining AdaRound with hardware-oriented optimization methods, thereby enhancing the efficiency and deployment feasibility of quantized models on specialized hardware platforms. Additionally, further exploration into data-free approaches could extend AdaRound's applicability to scenarios with limited data availability.
In conclusion, AdaRound represents a significant advancement in post-training quantization, offering a versatile and efficient mechanism adaptable across various neural network models, thereby supporting the broader adoption of deep learning approaches on low-power devices.