Cooperative Knowledge Distillation: A Learner Agnostic Approach (2402.05942v1)
Abstract: Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which restrict its use: all knowledge is transferred from teacher to student regardless of whether or not that knowledge is useful, the student is the only one learning in this exchange, and typically distillation transfers knowledge only from a single teacher to a single student. We formulate a novel form of knowledge distillation in which many models can act as both students and teachers which we call cooperative distillation. The models cooperate as follows: a model (the student) identifies specific deficiencies in it's performance and searches for another model (the teacher) who encodes learned knowledge into instructional virtual instances via counterfactual instance generation. Because different models may have different strengths and weaknesses, all models can act as either students or teachers (cooperation) when appropriate and only distill knowledge in areas specific to their strengths (focus). Since counterfactuals as a paradigm are not tied to any specific algorithm, we can use this method to distill knowledge between learners of different architectures, algorithms, and even feature spaces. We demonstrate that our approach not only outperforms baselines such as transfer learning, self-supervised learning, and multiple knowledge distillation algorithms on several datasets, but it can also be used in settings where the aforementioned techniques cannot.
- Variational Information Distillation for Knowledge Transfer. arXiv:1904.05835.
- Allison, P. D. 2001. Missing Data. Sage Publications.
- Alsenani, D. 2020. US Cars Dataset: Online Car Auction in North American. Retrieved from https://www.kaggle.com/doaaalsenani/usa-cers-dataset.
- Ashrapov, I. 2020. Tabular GANs for uneven distribution. arXiv preprint arXiv:2010.00638.
- Online Knowledge Distillation With Diverse Peers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 3430–3437.
- Cross-layer distillation with semantic calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35.
- UCI Machine Learning Repository. Http://archive.ics.uci.edu/ml.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129(6): 1789–1819.
- Knowledge Distillation: A Survey. International Journal of Computer Vision, 129(6).
- Counterfactual visual explanations. In International Conference on Machine Learning, 2376–2384. PMLR.
- Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data. Metabolites, 4(2): 433–452.
- Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01).
- Hinton, G. e. A. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Janosi, A. e. a. 1988. Heart Disease Data Sets.
- Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable ai (xai). In International Conference on Case-Based Reasoning. Springer.
- Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks, volume 4, 1942–1948. IEEE.
- Paraphrasing complex network: Network compression via factor transfer. arXiv preprint arXiv:1802.04977.
- Knowledge Distillation via Instance Relationship Graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 5191–5198.
- Mital, A. 2020. US Used cars dataset. Https://www.kaggle.com/ananaymital/us-used-cars-dataset.
- Molnar, C. 2019. Interpretable Machine Learning. Independently published.
- A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22(10): 1345–1359.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
- Reese, A. 2021. Used Cars Dataset Vehicles listings from Craigslist.org. Https://www.kaggle.com/austinreese/craigslist-carstrucks-data.
- Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
- Counterfactual generative networks. arXiv preprint arXiv:2101.06046.
- Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1365–1374.
- Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747.
- Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10687–10698.
- A Gift From Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- mixup: Beyond Empirical Risk Minimization.
- Adversarial co-distillation learning for image recognition. Pattern Recognition, 111: 107659.
- Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4320–4328.
- Self-distillation as instance-specific label smoothing. arXiv preprint arXiv:2006.05065.
- A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1): 43–76.
- Michael Livanos (4 papers)
- Ian Davidson (29 papers)
- Stephen Wong (3 papers)