17 days ago

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis

Abstract

Diffusion models (DMs) have revolutionized generative learning. They utilizea diffusion process to encode data into a simple Gaussian distribution.However, encoding a complex, potentially multimodal data distribution into asingle continuous Gaussian distribution arguably represents an unnecessarilychallenging learning problem. We propose Discrete-Continuous Latent VariableDiffusion Models (DisCo-Diff) to simplify this task by introducingcomplementary discrete latent variables. We augment DMs with learnable discretelatents, inferred with an encoder, and train DM and encoder end-to-end.DisCo-Diff does not rely on pre-trained networks, making the frameworkuniversally applicable. The discrete latents significantly simplify learningthe DM's complex noise-to-data mapping by reducing the curvature of the DM'sgenerative ODE. An additional autoregressive transformer models thedistribution of the discrete latents, a simple step because DisCo-Diff requiresonly few discrete variables with small codebooks. We validate DisCo-Diff on toydata, several image synthesis tasks as well as molecular docking, and find thatintroducing discrete latents consistently improves model performance. Forexample, DisCo-Diff achieves state-of-the-art FID scores on class-conditionedImageNet-64/128 datasets with ODE sampler.