Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BD-KD: Balancing the Divergences for Online Knowledge Distillation (2212.12965v2)

Published 25 Dec 2022 in cs.CV

Abstract: We address the challenge of producing trustworthy and accurate compact models for edge devices. While Knowledge Distillation (KD) has improved model compression in terms of achieving high accuracy performance, calibration of these compact models has been overlooked. We introduce BD-KD (Balanced Divergence Knowledge Distillation), a framework for logit-based online KD. BD-KD enhances both accuracy and model calibration simultaneously, eliminating the need for post-hoc recalibration techniques, which add computational overhead to the overall training pipeline and degrade performance. Our method encourages student-centered training by adjusting the conventional online distillation loss on both the student and teacher losses, employing sample-wise weighting of forward and reverse Kullback-Leibler divergence. This strategy balances student network confidence and boosts performance. Experiments across CIFAR10, CIFAR100, TinyImageNet, and ImageNet datasets, and various architectures demonstrate improved calibration and accuracy compared to recent online KD methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ibtihel Amara (6 papers)
  2. Nazanin Sepahvand (1 paper)
  3. Brett H. Meyer (11 papers)
  4. Warren J. Gross (75 papers)
  5. James J. Clark (32 papers)
Citations (2)