Compensation Sampling for Improved Convergence in Diffusion Models

Diffusion models achieve remarkable quality in image generation, but at acost. Iterative denoising requires many time steps to produce high fidelityimages. We argue that the denoising process is crucially limited by anaccumulation of the reconstruction error due to an initial inaccuratereconstruction of the target data. This leads to lower quality outputs, andslower convergence. To address this issue, we propose compensation sampling toguide the generation towards the target domain. We introduce a compensationterm, implemented as a U-Net, which adds negligible computation overhead duringtraining and, optionally, inference. Our approach is flexible and wedemonstrate its application in unconditional generation, face inpainting, andface de-occlusion using benchmark datasets CIFAR-10, CelebA, CelebA-HQ,FFHQ-256, and FSG. Our approach consistently yields state-of-the-art results interms of image quality, while accelerating the denoising process to convergeduring training by up to an order of magnitude.