Zero-shot Knowledge Transfer via Adversarial Belief Matching (1905.09768v4)

Published 23 May 2019 in cs.LG and stat.ML

Abstract: Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with 100 images per class), despite using no data. Finally, we also propose a metric to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries, and observe a significantly higher match between our zero-shot student and the teacher, than between a student distilled with real data and the teacher. Code available at: https://github.com/polo5/ZeroShotKnowledgeTransfer

Citations (212)

View on Semantic Scholar

Summary

The paper introduces a data-independent adversarial framework that distills teacher knowledge into a student model using synthesized pseudo-images.
The method achieves competitive performance on CIFAR-10 and SVHN, reaching up to 83.69% accuracy and surpassing few-shot techniques when fine-tuned.
The approach minimizes Mean Transition Error near decision boundaries, setting the stage for future research in privacy-preserving model compression.

Zero-shot Knowledge Transfer via Adversarial Belief Matching

This paper addresses the challenge of transferring knowledge from a large teacher network to a smaller student network in scenarios where the original training data is inaccessible, a scenario becoming more common due to privacy regulations and proprietary datasets. Traditional techniques like knowledge distillation and network pruning fundamentally rely on having access to such data, but this approach, dubbed Zero-shot Knowledge Transfer (KT), innovatively sidesteps those requirements.

Methodology

The authors propose an adversarial framework comprising a student network, a teacher network, and a generator. The generator's role is critical: it synthesizes pseudo-images intended to maximize the Kullback-Leibler (KL) divergence between the teacher and student network outputs. By training the generator adversarially, images are selected that highlight discrepancies between teacher and student predictions, providing a focused and meaningful signal for student training.

Subsequent to generating these informative stimuli, the student updates its parameters to minimize divergence from the teacher's output on these pseudo-images. This zero-shot method is entirely data-independent, yet it demonstrates competency across several complex datasets, including CIFAR-10 and SVHN.

Results

The research articulates significant performance achievements, especially given the challenging zero-data constraints. For instance, on CIFAR-10, the proposed method achieves a test accuracy of 83.69% using no real data for training. Interestingly, when fine-tuned with minimal data (e.g., 100 images per class), the method reaches an accuracy of 85.91%, outperforming existing few-shot techniques like Variational Information Distillation (VID) by over 4%.

These outcomes are especially notable considering competing zero-shot techniques like that of Nayak et al. yield a significantly lower accuracy under comparable conditions. The paper asserts notable advancements in connection with the mean discrepancy between network predictions near decision boundaries, offering a new metric—Mean Transition Error (MTE)—that gauges the conformity of student beliefs to those of the teacher. The zero-shot student boasts a much lower MTE in boundary proximities compared to traditional students distilled with regular data, underscoring the adeptness of the adversarial learning setup.

Implications

The proposed adversarial zero-shot KT method represents a significant contribution to model compression and deployment in privacy-sensitive environments. Its independence from training data can extend application scenarios to include proprietary or sensitive datasets, a frequent situation in implementation-oriented use cases like cloud services or edge devices.

In theoretical terms, the method provokes further examination of zero-shot training paradigms, potentially influencing how the field approaches situations where student models must be derived from teachers without data overlap. This also invites investigation into the alignment of network decision boundaries as a central topic of paper.

Future Directions

Further exploration is anticipated in the domain, particularly around refining the adversarial process to increase flexibility with diverse network architectures and reducing the Mean Transition Error further to hone decision boundary alignment between teacher and student networks.

Overall, this paper provides a robust starting point for further research into zero-shot learning techniques, addressing critical limitations posed by existing data-reliant methods and advancing the discourse on model accessibility and efficiency.

PDF Markdown

Related Papers

GitHub

GitHub - polo5/ZeroShotKnowledgeTransfer: Accompanying code for the paper "Zero-shot Knowledge Transfer via Adversarial Belief Matching" (139 stars)