Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy (1711.05852v1)

Published 15 Nov 2017 in cs.LG, cs.CV, and cs.NE

Abstract: Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems - the models (often deep networks or wide networks or both) are compute and memory intensive. Low-precision numerics and model compression using knowledge distillation are popular techniques to lower both the compute requirements and memory footprint of these deployed models. In this paper, we study the combination of these two techniques and show that the performance of low-precision networks can be significantly improved by using knowledge distillation techniques. Our approach, Apprentice, achieves state-of-the-art accuracies using ternary precision and 4-bit precision for variants of ResNet architecture on ImageNet dataset. We present three schemes using which one can apply knowledge distillation techniques to various stages of the train-and-deploy pipeline.

PDF Abstract

Improving Low-Precision Network Accuracy Through Knowledge Distillation: A Deep Dive into Apprentice

The paper "Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy” by Asit Mishra and Debbie Marr presents a comprehensive paper on enhancing low-precision deep neural networks (DNNs) through an innovative integration of knowledge distillation techniques. With growing demands for deploying complex models on resource-constrained systems, the authors tackle the critical challenge of maintaining high accuracy with reduced computation and memory requirements.

Background and Motivation

High-performing DNNs for tasks such as image classification typically involve extensive computation and storage resources, often incompatible with real-time deployment in edge devices. Traditional solutions, like quantization and model compression, have been deployed independently with varying degrees of success. Quantization reduces the model's operational precision, while model compression attempts to mimic the behavior of complex networks with simpler counterparts. However, both approaches typically suffer from accuracy degradation compared to full-precision models.

Methodology: The Apprentice Approach

The Apprentice approach proposed in the paper leverages the synergy between quantization and knowledge distillation to improve the accuracy of low-precision models. Knowledge distillation involves training a smaller student model under the supervision of a larger teacher model. The novelty of their approach lies in applying a similar topology with lower precision for the student model as compared to the full-precision teacher model.

The authors propose three distinct schemes:

Joint Training (Scheme-A): This involves simultaneous training of a full-precision teacher network and a low-precision student network, facilitating ongoing guidance from the teacher to the student throughout the learning process.
Continuous Distillation (Scheme-B): A trained full-precision teacher network aids the training of a low-precision student network from scratch. This offloads the computational effort of the teacher network, expediting the training process for the student network.
Fine-Tuning (Scheme-C): The student network, initially trained with full precision, is fine-tuned at a lower precision, refining the network without starting from scratch.

Results and Analysis

The research demonstrates that each scheme enhances the accuracy of low-precision models well beyond established baselines. For instance, the ResNet models (e.g., ResNet-18, ResNet-34, and ResNet-50) show significant improvement in accuracy with Apprentice techniques when applied to the ImageNet dataset. Notably, their ternary precision and 4-bit precision models achieve accuracies that outperform prior art by margins up to 2%-3%.

Moreover, joint training (Scheme-A) shows that lower-precision models can approach, and in some cases surpass, the accuracy of full-precision counterparts. The distinct advantage of Scheme-B is a reduction in training time due to precomputed logits, while Scheme-C further refines model accuracy, albeit at a slower pace.

Implications and Future Work

The implications of this research are profound for deploying AI models in environments where computational resources are limited, such as mobile devices or IoT networks. By bridging the performance gap between low and full-precision models, the Apprentice approach paves the way for more efficient and effective AI deployment across a plethora of applications.

Future work may explore optimizing hyperparameters, experimenting with other architectures, or integrating these techniques with new distillation strategies to push the boundaries of DNN performance under resource constraints. Additionally, understanding the interplay of different model architectures and distillation temperatures could yield even further improvements.

In conclusion, the Apprentice framework represents a significant advancement in the ongoing enhancement of low-precision network accuracies, providing a robust toolkit for developers and researchers focused on efficient AI model deployment.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Asit Mishra (8 papers)
Debbie Marr (6 papers)

Citations (321)

View on Semantic Scholar