Rectifying the Data Bias in Knowledge Distillation

Knowledge distillation is a representative technique formodel compression and acceleration, which is important fordeploying neural networks on resource limited devices. Theknowledge transferred from teacher to student is the mapping of teacher model, or represented by all the input-outputpairs. However, in practice the student model only learnsfrom data pairs of the dataset that may be biased, and wethink this limits the performance of knowledge distillation.In this paper, we first quantitatively define the uniformityof the sampled data for training, providing a unified viewfor methods that learn from biased data. Then we evaluatethe uniformity on real world dataset and show that existing methods actually improve the uniformity of data. Wefurther introduce two uniformity-oriented methods for rectifying the bias of data for knowledge distillation. Extensive experiments conducted on Face Recognition and Person Re-identification have shown the effectiveness of ourmethod. Moreover, we analyze the sampled data on FaceRecognition and show that better balance is achieved between races and between easy and hard samples. And thiseffect can be also confirmed in training the student modelfrom scratch, resulting in a comparable performance withstandard knowledge distillation.