Palette: Image-to-Image Diffusion Models

This paper develops a unified framework for image-to-image translation basedon conditional diffusion models and evaluates this framework on fourchallenging image-to-image translation tasks, namely colorization, inpainting,uncropping, and JPEG restoration. Our simple implementation of image-to-imagediffusion models outperforms strong GAN and regression baselines on all tasks,without task-specific hyper-parameter tuning, architecture customization, orany auxiliary loss or sophisticated new techniques needed. We uncover theimpact of an L2 vs. L1 loss in the denoising diffusion objective on samplediversity, and demonstrate the importance of self-attention in the neuralarchitecture through empirical studies. Importantly, we advocate a unifiedevaluation protocol based on ImageNet, with human evaluation and sample qualityscores (FID, Inception Score, Classification Accuracy of a pre-trainedResNet-50, and Perceptual Distance against original images). We expect thisstandardized evaluation protocol to play a role in advancing image-to-imagetranslation research. Finally, we show that a generalist, multi-task diffusionmodel performs as well or better than task-specific specialist counterparts.Check out https://diffusion-palette.github.io for an overview of the results.