Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation (2105.08919v1)

Published 19 May 2021 in cs.LG and cs.CV

Abstract: Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter tau. Despite its widespread use, few studies have discussed the influence of such softening on generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when tau increases and the label matching when tau goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the difference in the penultimate layer representations between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, particularly when using the KL divergence loss with small tau, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.

Authors (5)

Taehyeon Kim (28 papers)
Jaehoon Oh (18 papers)
NakYil Kim (2 papers)
Sangwook Cho (2 papers)
Se-Young Yun (114 papers)

Citations (192)

View on Semantic Scholar

Summary

Overview of "Style Guideline"

The paper "Style Guideline" serves as a meticulous guide for authors intending to submit manuscripts for the IJCAI--21 proceedings. It delineates the comprehensive formatting and style guidelines that authors must adhere to when preparing their submissions. The primary objective of this document is to ensure uniformity and consistency across all papers included in the conference's proceedings.

Paper Structure and Content

The document is organized into several sections, each addressing specific aspects of paper preparation and formatting. These include detailed instructions on acceptable page length, text formatting, layout specifications, and the proper formatting of tables, figures, and equations.

Document Formatting: The text specifies that submissions should be in PDF format, adhering to an 8.5" x 11" page size. It provides the necessary dimensions for page margins, column widths and heights, and spacing to be used.
Length Specifications: Strict guidelines are provided regarding the length of the papers, mandating a maximum of six pages of content with an optional additional page reserved exclusively for references.
Style Files Availability: It mentions the availability of \LaTeX{} and Microsoft Word templates designed to facilitate the process of adherence to these formatting standards. These files can be retrieved from the conference's website and are accompanied by instructions for their use.
Blind Review Process: Authors are instructed on how to prepare their manuscripts for the blind review process, which includes omitting any identifying information from the initial submission.

Technical Aspects and Formatting

The document explores the technical nuances of document preparation in \LaTeX{} and Word. It provides templates and style files to ensure compatibility with the formatting rules. Specific conversations about \LaTeX{} document preparation include:

Usage of Fonts and Packages: Guidance is provided on recommended fonts, like Times Roman, to ensure uniformity, alongside suggestions for additional packages such as latexsym that can be used in \LaTeX{}.
Handling Illustrations and Tables: Detailed instructions are provided for embedding figures, tables, and illustrations. This guidance emphasizes the importance of having these elements render well in both draft and final conference proceedings, thereby ensuring high-quality presentation of data.
Formulas and Equations: Authors are cautioned against using small font sizes for equations, which could disrupt the visual balance of the text and make mathematical content harder to read. They are encouraged to split lengthy formulas across lines without compromising readability.

Implications for Researchers

For researchers, adherence to such guidelines is critical as it directly affects the readability and professional presentation of their scholarly work in the conference proceedings. Proper formatting as outlined ensures that the document can be uniformly reproduced both in print and digital formats, facilitating broad dissemination of their research findings.

Future Considerations

While this paper provides a robust framework for manuscript submission, future conferences might consider extending these guidelines to accommodate changes in digital publishing and accessibility standards. This could include enhanced guidelines comprehensible to authors submitting from diverse computational backgrounds and environments, reflecting evolving practices in academic publishing.

In conclusion, this paper provides essential guidelines that, if meticulously followed, allow authors to present their work effectively in the IJCAI conference proceedings. The comprehensive nature of these instructions supports researchers in producing professionally formatted documents that uphold the conference's academic standards.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - jhoon-oh/kd_data: IJCAI 2021, "Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation" (39 stars)