Overcoming Oscillations in Quantization-Aware Training (2203.11086v2)

Published 21 Mar 2022 in cs.LG

Abstract: When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training (QAT) are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise during training. These effects are particularly pronounced in low-bit ($\leq$ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets. In our analysis we investigate several previously proposed QAT algorithms and show that most of these are unable to overcome oscillations. Finally, we propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing. We demonstrate that our algorithms achieve state-of-the-art accuracy for low-bit (3 & 4 bits) weight and activation quantization of efficient architectures, such as MobileNetV2, MobileNetV3, and EfficentNet-lite on ImageNet. Our source code is available at {https://github.com/qualcomm-ai-research/oscillations-qat}.

Citations (83)

View on Semantic Scholar

Summary

The paper demonstrates that oscillations in quantized weights cause significant accuracy degradation, especially in low-bit MobileNet and EfficientNet models.
It proposes an oscillation dampening method that adds a regularization term to align weights with quantization bin centers.
The study introduces iterative weight freezing to stabilize training, achieving state-of-the-art low-bit accuracy on ImageNet.

Overview of "Overcoming Oscillations in Quantization-Aware Training"

Quantization-aware training (QAT) is a critical technique for optimizing neural networks to operate efficiently on edge devices by employing low-bit representations for weights and activations. Despite the benefits of QAT, this paper identifies a considerable challenge—oscillations in quantized weights during training can lead to performance degradation. This paper not only highlights this issue but also proposes novel methods to mitigate it.

Key Contributions

The primary contribution of the paper is a systematic paper of oscillations in quantized weights, an under-explored phenomenon in the literature. The authors find that these oscillations can significantly degrade network accuracy, especially for networks employing depth-wise separable layers like MobileNets and EfficientNets and at low bit-widths (≤4-bits). The adverse effects are primarily due to incorrect batch-normalization statistics during inference and increased noise during training.

Proposed Solutions

To address the oscillation issue, the authors propose two innovative algorithms:

Oscillation Dampening: This method involves adding a regularization term to the loss function, encouraging weights to align with the centers of quantization bins, thereby reducing oscillations.
Iterative Weight Freezing: In this approach, weights identified to oscillate beyond a certain frequency are frozen at their most frequent quantization level during training, effectively eliminating the oscillations.

Analyses and Results

The paper offers a detailed analysis demonstrating that most existing QAT algorithms fail to address oscillations effectively. By implementing the two proposed methods, the authors achieve state-of-the-art accuracy for low-bit (3 and 4-bits) weight and activation quantization in networks like MobileNetV2, MobileNetV3, and EfficientNet-lite on the ImageNet dataset.

The improvement is not just theoretical but is backed by strong empirical results. The authors show that by preventing oscillations, they achieve classification accuracies significantly higher than previous methods, with gains over 1\% in certain network configurations. The paper further reinforces the notion that accurately quantized networks must handle intrinsic oscillations within training, which were previously ignored.

Implications and Future Directions

The methods described carry significant implications for deploying neural networks in real-world applications, especially in resource-constrained environments where computational efficiency and battery life are critical. By enhancing QAT's ability to maintain accuracy at lower bit-widths, the proposed strategies could lead to more widespread and effective deployment of neural networks on edge devices.

Theoretically, the findings also open avenues for more detailed explorations of weight dynamics in quantized training regimes. As the paper identifies one significant consequence of using STE and similar estimators during QAT, future research can expand to explore more about how oscillations might interact with other architectural components and further refine these methods.

Conclusion

In conclusion, this paper makes a significant contribution by identifying and addressing oscillations in QAT, an effect largely overlooked in current literature. The proposed solutions not only advance the field of quantization but also enhance the practical deployment of resource-efficient neural network models. The methods and insights presented provide a foundation for further research that can refine low-bit quantization techniques for even wider adoption across diverse AI applications.

PDF Markdown

Related Papers

GitHub

GitHub - Qualcomm-AI-research/oscillations-qat (65 stars)

YouTube

Show All Videos