Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Model quantization is known as a promising method to compress deep neuralnetworks, especially for inferences on lightweight mobile or edge devices.However, model quantization usually requires access to the original trainingdata to maintain the accuracy of the full-precision models, which is ofteninfeasible in real-world scenarios for security and privacy issues. A popularapproach to perform quantization without access to the original data is to usesynthetically generated samples, based on batch-normalization statistics oradversarial learning. However, the drawback of such approaches is that theyprimarily rely on random noise input to the generator to attain diversity ofthe synthetic samples. We find that this is often insufficient to capture thedistribution of the original data, especially around the decision boundaries.To this end, we propose Qimera, a method that uses superposed latent embeddingsto generate synthetic boundary supporting samples. For the superposedembeddings to better reflect the original distribution, we also propose usingan additional disentanglement mapping layer and extracting information from thefull-precision model. The experimental results show that Qimera achievesstate-of-the-art performances for various settings on data-free quantization.Code is available at https://github.com/iamkanghyunchoi/qimera.