Data-Free Learning of Student Networks (1904.01186v4)

Published 2 Apr 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DAFL) method achieve 92.22% and 74.47% accuracies using ResNet-18 without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.

Authors (9)

Hanting Chen (52 papers)
Yunhe Wang (145 papers)
Chang Xu (323 papers)
Zhaohui Yang (193 papers)
Chuanjian Liu (15 papers)
Boxin Shi (64 papers)
Chunjing Xu (66 papers)
Chao Xu (283 papers)
Qi Tian (314 papers)

Citations (341)

View on Semantic Scholar

Summary

Data-Free Learning of Student Networks

The paper "Data-Free Learning of Student Networks" addresses a significant challenge in deploying pre-trained convolutional neural networks (CNNs) on edge devices, where computational and storage constraints are paramount. Traditional approaches to neural network compression demand access to the original training datasets, which are often unavailable due to privacy, legal, or transmission issues. The authors propose a novel framework utilizing Generative Adversarial Networks (GANs) to circumvent this limitation, aiming to enable the deployment of efficient, data-free student networks by leveraging pre-trained networks (teacher networks) as surrogate discriminators in a GAN setup.

Methodology

The framework introduced involves treating the pre-trained teacher network as a fixed discriminator in a generative adversarial setting. This fixed discriminator guides a generator network to produce synthetic training data that mimics the response characteristics of the original data on the teacher network. The generated samples are then used to train a more compact student network, which learns to mimic the teacher network's outputs via knowledge distillation.

The innovation lies in the generator's loss functions. The authors introduce three core loss terms:

One-hot Loss: Encourages outputs that resemble one-hot vectors, ensuring the generated samples generate strong activations for specific classes in the teacher model.
Activation Loss: Maximizes neuron activations within the teacher network, promoting the generation of samples that closely adhere to data characteristics the teacher network has learned.
Information Entropy Loss: Ensures class balance by optimizing the distribution of generated samples across the categories.

Results

Performance evaluation of this data-free learning framework across various benchmarks indicates that student networks trained without direct access to the original training data can achieve competitive accuracy. For instance, student networks trained on motion datasets like CIFAR-10 and CIFAR-100 achieved accuracies of 92.22% and 74.47%, respectively, using ResNet-18 architectures. This illustrates the potential of using generatively synthesized data for training without compromising significantly on performance. Additionally, a performance accuracy of 80.03% on the CelebA dataset highlights the framework’s applicability to more complex classification tasks.

Implications

The implications of this research are multidimensional. Practically, it addresses deployment challenges in real-world applications where access to original datasets is restricted. It potentially shifts the landscape for on-device learning, where models can be optimized, retargeted, or fine-tuned with synthesized data, preserving user privacy. Theoretically, the research advances understanding in data-free transfer learning, challenging existing paradigms that necessitate access to vast quantities of training data for efficient model distillation.

Future Directions

While the current work utilizes GANs for data synthesis, future research could explore alternative generative paradigms or hybrid approaches that further optimize synthetic data fidelity. Additionally, expanding the framework's application to other domains (e.g., speech recognition or NLP) could prove valuable, as would exploring robustness issues associated with adversarially generated training data. Research might also evaluate impacts on model interpretability, potentially enhancing insights into the role of inductive biases introduced by generative processes.

This paper paves the way for new research dimensions in network compression and offers promising routes toward privacy-preserving machine learning solutions, ensuring that deep learning can cater to evolving demands for sustainable and secure AI solutions.

PDF Markdown

Related Papers

Find Related Papers