Correlation Congruence for Knowledge Distillation (1904.01802v1)

Published 3 Apr 2019 in cs.CV

Abstract: Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge transfer. In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information, but also the correlation between instances. Furthermore, a generalized kernel method based on Taylor series expansion is proposed to better capture the correlation between instances. Empirical experiments and ablation studies on image classification tasks (including CIFAR-100, ImageNet-1K) and metric learning tasks (including ReID and Face Recognition) show that the proposed CCKD substantially outperforms the original KD and achieves state-of-the-art accuracy compared with other SOTA KD-based methods. The CCKD can be easily deployed in the majority of the teacher-student framework such as KD and hint-based learning methods.

Citations (475)

View on Semantic Scholar

Summary

The paper introduces a novel CCKD method that incorporates correlation congruence alongside instance-level knowledge transfer to enhance student network performance.
It employs a generalized kernel method with Taylor series expansion and strategic mini-batch samplers to capture intra- and inter-class correlations.
Experiments demonstrate improved top-1 accuracy on CIFAR-100 and ImageNet-1K, along with superior metrics in person re-identification and face recognition tasks.

Correlation Congruence for Knowledge Distillation

The paper "Correlation Congruence for Knowledge Distillation" proposes an advancement in the field of knowledge distillation by introducing a refined framework named Correlation Congruence Knowledge Distillation (CCKD). The work effectively shifts some focus from the prevalent instance-level knowledge transfer to also incorporate the correlation between multiple instances, aiming to enhance the performance of student networks.

Key Contributions

This research identifies several limitations inherent in traditional knowledge distillation methods, such as the common practice of enforcing strong congruence only at the instance level. By doing so, traditional methods overlook valuable correlation knowledge between instances that can be leveraged to improve performance. The authors introduce CCKD to integrate both instance congruence and correlation congruence, allowing for a more comprehensive transfer of knowledge from teacher to student networks.

Significant contributions of this work include:

Introduction of Correlation Congruence: The authors propose correlating the congruence between instances, presenting it as an important factor for effective knowledge transfer.
Generalized Kernel Method: A kernel-based method is employed, utilizing Taylor series expansion to efficiently capture instance correlations, providing enhanced flexibility and improving distillation performance.
Mini-batch Sampler Strategies: The research explores class-uniform and superclass-uniform random samplers to balance the intra-class and inter-class correlation measures, which are critical for knowledge distillation.

Empirical Results

Experiments conducted on image classification tasks such as CIFAR-100 and ImageNet-1K, as well as metric learning tasks like person re-identification and face recognition, demonstrate that CCKD outperforms traditional KD and other state-of-the-art distillation techniques, achieving higher accuracy and improved model efficacy.

On CIFAR-100, CCKD surpassed standard knowledge distillation by 1.6-1.9% in top-1 accuracy depending on the student network used.
On ImageNet-1K, CCKD reached a top-1 accuracy of 67.7%, marking a 1% improvement over traditional KD approaches.
In metric learning settings, such as ReID and face recognition, CCKD significantly improved performance metrics like mAP and Rank-1 accuracy against baseline KD approaches.

Implications and Future Directions

The inclusion of correlation congruence represents a substantial refinement in distillation tasks, indicating that exploiting relational information can lead to significantly enhanced performance for student networks. This approach showcases potential avenues for improvements in resource-constrained environments, where smaller, less complex networks can achieve near-parity with heavyweight models.

The findings affirm the advantage of comprehensive knowledge transfer, suggesting utility beyond typical vision tasks, potentially extending to various domains requiring efficient model deployment. Continued exploration could examine more advanced kernel methods or explore the implications of correlation congruence within different architectures or task domains, fostering advancements in model compression and acceleration.

Correlation Congruence for Knowledge Distillation (1904.01802v1)

Summary

Correlation Congruence for Knowledge Distillation

Key Contributions

Empirical Results

Implications and Future Directions

Follow-up Questions

Authors (8)

Correlation Congruence for Knowledge Distillation (1904.01802v1)

Summary

Correlation Congruence for Knowledge Distillation

Key Contributions

Empirical Results

Implications and Future Directions

Follow-up Questions

Related Papers

Authors (8)