1. Goal
The goal is to perform Data-free Knowledge distillation.
Knowledge distillation: Dealing with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance.
As the word itself, We perform knowledge distillation when there is no original dataset on which the Teacher network has been trained. It is because, in the real world, most datasets are proprietary and not shared publicly due to privacy or confidentiality concerns.
To tackle this problem, it is necessary to reconstruct a dataset for training Student network. Thus, in Dreaming to Distill Data-free Knowledge Transfer via DeepInversion, we propose a new method, which synthesizes images from the image distribution used to train a deep neural network. Further, we aim to improve the diversity of synthesized images.
2. Method
- Proposing DeepInversion, a new method for synthesizing class-conditional images from a CNN trained for image classification.
- Introducing a regularization term for intermediate layer activations of synthesized images based on just the two layer-wise statistics: mean and variance from teacher network.
- Improving synthesis diversity via application-specific extension of DeepInversion, called Adaptive DeepInversion.
- Exploiting disagreements between the pretrained teacher and the in-training student network to expand the coverage of the training set.
The overall procedure of our method is described in Fig. 1.

2.1 Background
2.1.1 Knowledge distillation
Given a trained model
where
2.1.2 DeepDream
DeepDream synthesize a large set of images
Given a randomly initialized input
where
where
2.2 DeepInversion (DI)
We improve DeepDream's image quality by extending image regularization
To effectively enforce feature similarities between
where
We refer to this model inversion method as DeepInversion.
2.3 Adaptive DeepInversion (ADI)
Diversity also plays a crucial role in avoiding repeated and redundant synthetic images. For this, we propose Adaptive DeepInversion, an enhanced image generation scheme based on an iterative competition scheme between the image generation process and the student network. The main idea is to encourage the synthesized images to cause student-teacher disagreement.
Then, we introduce an additional loss
where
During optimization, this new term leads to new images the student cannot easily classify whereas the teacher can. As illustrated in Fig 2. our proposal iteratively expands the distributional coverage of the image distribution during the learning process. The regularization

Reference
Yin, Hongxu, et al. "Dreaming to distill: Data-free knowledge transfer via DeepInversion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Github Code: Dreaming to Distill
'AI paper review > Model Compression' 카테고리의 다른 글
EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning (0) | 2022.03.10 |
---|---|
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN (0) | 2022.03.09 |
Zero-Shot Knowledge Transfer via Adversarial Belief Matching (0) | 2022.03.08 |
Data-Free Learning of Student Networks (0) | 2022.03.08 |
Zero-Shot Knowledge Distillation in Deep Networks (0) | 2022.03.08 |