Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation (2402.12406v1)

Published 18 Feb 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Data-free knowledge distillation (DFKD) aims to distill pretrained knowledge to a student model with the help of a generator without using original data. In such data-free scenarios, achieving stable performance of DFKD is essential due to the unavailability of validation data. Unfortunately, this paper has discovered that existing DFKD methods are quite sensitive to different teacher models, occasionally showing catastrophic failures of distillation, even when using well-trained teacher models. Our observation is that the generator in DFKD is not always guaranteed to produce precise yet diverse samples using the existing representative strategy of minimizing both class-prior and adversarial losses. Through our empirical study, we focus on the fact that class-prior not only decreases the diversity of generated samples, but also cannot completely address the problem of generating unexpectedly low-quality samples depending on teacher models. In this paper, we propose the teacher-agnostic data-free knowledge distillation (TA-DFKD) method, with the goal of more robust and stable performance regardless of teacher models. Our basic idea is to assign the teacher model a lenient expert role for evaluating samples, rather than a strict supervisor that enforces its class-prior on the generator. Specifically, we design a sample selection approach that takes only clean samples verified by the teacher model without imposing restrictions on the power of generating diverse samples. Through extensive experiments, we show that our method successfully achieves both robustness and training stability across various teacher models, while outperforming the existing DFKD methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay. In Association for the Advancement of Artificial Intelligence, AAAI, 6089–6096. AAAI Press.
  2. Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data. In Winter Conference on Applications of Computer Vision, WACV, 3625–3633. IEEE.
  3. Data-Free Learning of Student Networks. In International Conference on Computer Vision, ICCV, 3513–3521. IEEE.
  4. ImageNet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition,CVPR, volume 00, 248–255.
  5. Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation. In Advances in Neural Information Processing Systems,NeurIPS.
  6. Data-Free Adversarial Distillation. CoRR, abs/1912.11006.
  7. Contrastive Model Invertion for Data-Free Knolwedge Distillation. In International Joint Conference on Artificial Intelligence, IJCAI, 2374–2380. ijcai.org.
  8. Generative Adversarial Networks. CoRR, abs/1406.2661.
  9. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in Neural Information Processing Systems,NeurIPS, 8536–8546.
  10. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition,CVPR, 770–778. IEEE.
  11. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems,NeurIPS, 6626–6637.
  12. Distilling the Knowledge in a Neural Network. CoRR, abs/1503.02531.
  13. Revisiting Data-Free Knowledge Distillation with Poisoned Teachers. In International Conference on Machine Learning,ICML, volume 202 of Proceedings of Machine Learning Research, 13199–13212. PMLR.
  14. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels. In International Conference on Machine Learning,ICML, volume 80, 2309–2318. PMLR.
  15. Learning multiple layers of features from tiny images. Technical Report 0.
  16. DivideMix: Learning with Noisy Labels as Semi-supervised Learning. In International Conference on Learning Representations, ICLR. OpenReview.net.
  17. Dynamic data-free knowledge distillation by easy-to-hard learning strategy. Inf. Sci., 642: 119202.
  18. Decoupling ”when to update” from ”how to update”. In Advances in Neural Information Processing Systems,NeurIPS, 960–970.
  19. Zero-shot Knowledge Transfer via Adversarial Belief Matching. In Wallach, H. M.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E. B.; and Garnett, R., eds., Advances in Neural Information Processing Systems,NeurIPS, 9547–9557.
  20. Zero-Shot Knowledge Distillation in Deep Networks. In International Conference on Machine Learning, ICML, volume 97, 4743–4751. PMLR.
  21. Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation. CoRR, abs/2302.14290.
  22. Adaptive Data-Free Quantization. In Conference on Computer Vision and Pattern Recognition,CVPR, 7960–7968. IEEE.
  23. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In International Conference on Learning Representations, ICLR.
  24. Learning from Noisy Labels with Deep Neural Networks: A Survey. CoRR, abs/2007.08199.
  25. Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion. In Conference on Computer Vision and Pattern Recognition,CVPR, 8712–8721. IEEE.
  26. Knowledge Extraction with No Observable Data. In Advances in Neural Information Processing Systems,NeurIPS, 2701–2710.
  27. How does Disagreement Help Generalization against Label Corruption? In International Conference on Machine Learning,ICML, volume 97, 7164–7173. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Hyunjune Shin (1 paper)
  2. Dong-Wan Choi (10 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com